Threads for vgel

    1. 21

      I was pretty skeptical. Turns out I was correct be.

      Notably, it doesn’t support: structs; enums / unions; preprocessor directives; floating point; 8 byte types (long/long long or double); some other small things like pre/post cremements [sic], in-place initialization, etc., which just didn’t quite fit; any sort of standard library or i/o that isn’t returning an integer from main(); casting expressions

      So, not C. Or even really anything close to C. Still a fun exercise, but it’s not C. Maybe it could count as C circa 1980.

      Compilers have a reputation for being complex—GCC and Clang are massive, and even TCC, the Tiny C Compiler, is tens of thousands of lines of code—but if you’re willing to sacrifice code quality and do everything in a single pass, they can be surprisingly compact!

      Pascal was a single pass compiler and it supported much more than this does. The first publically available close-to-C compiler that I know of from the late 1970s was single pass and supported pre/post-increment expressions and basic preprocessor directives. It did not support everything, lacking stuctures and floating point support as well (part of the reason for the lack of floating point support was that it was not standardized at the time). It lacked support for enumerations since they did not exist in C then. It was roughly 2000 lines of C code (including comments) and could compile itself.

      The compilers mentioned as massive are that way because they support stuff that people want or need, so I’m not sure if this was said with tongue planted firmly in cheek.

      And “sacrificing code quality” is doing a lot of work in that sentence.

      1. 20

        The point of the compiler is to be something interesting to learn from, not a standards-compliant compiler. I picked 500 lines up front, and this is what fit. I definitely think the majority of C features (maybe minus the preprocessor since that’s famously corner-casey) could fit in 1k lines of Python, but that wouldn’t be as approachable for a blog post.

        1. 12

          My contention is that the title makes it sound like a C compiler when it is not, in fact, a C compiler.

          I’m fine with the effort. I’m sure it was a fun exercise. It looks like it was. And the language certainly is a reasonable subset of C. But it’s not the C that the title makes it seem.

          1. 3

            I thought it was a C/C++ compiler.

        2. 2

          It’s a little bit bigger than Small C https://en.m.wikipedia.org/wiki/Small-C but not much

          1. 3

            Well Small C is a few thousand lines of code, so… :-)

            I think a better comparison is c4 (https://gitee.com/yenmuse/c4/blob/master/c4.c), which implements more than my compiler, including a custom VM, in ~500 lines, albeit some of them are pretty long.

            1. 3

              Well Small C is a few thousand lines of code, so… :-)

              The first Small C was written in Small C and was, in fact, the compiler I referred to. It’s original version is only about 2000 lines. Given that it was written in the subset of C that it supported (which was very reasonable given the time) and it targetted 8080 assembly, 2000 lines is pretty good.

              1. 1

                Er, yes :-) I meant size of language rather than size of code, tho I think it might be hard to fit a Python interpreter and c500 onto an 8 bit micro - could it run in MicroPython on an Arduino?

              2. 1

                The original version of tcc was an IOCC entry, a self-hosting C compiler. It fitted on a single screen, though with multiple statements per line and single-character names for all variables. I think the pre-obfuscated version was close to 500 lines.

                I think it skipped a lot of error checking. If you fed it invalid C code, it would generate something (much like other C compilers, but for a larger set of invalid inputs).

            2. 17

              Love the compiler architecture drawing! The “Parsenon” and “Codegen Cave” made me laugh.

              1. 1

                Thanks! It was a lot of fun to draw :-)

              2. 3

                This is a really good article - a decent methodology, and well explained. (Have you considered using a jupyter notebook to capture both the methodology and output for repeatability?)

                There was an article mentioned on hn the other day delving into whether GPT has inner monologue / meta-cognition which gives some background on this. https://lukeplant.me.uk/blog/posts/chatgpt-no-inner-monologue-or-meta-cognition/ via https://news.ycombinator.com/item?id=35783650

                I think it’s possible that by restricting the output to just code, you’re reducing the ability to go from problem specification through to the code as the algorithm has to be generated while writing the code token by token.

                That got me wondering if there’s an approach to add with an intermediate instruction that helps the LLM generate correct code.

                E.g.: System Message: You are an expert programming assistant who rewrites a specification into unambiguous algorithm that can be implemented in any programming language. You provide only the algorithm, not commentary.

                User Message 1: Write an algorithm for a function to calculate whether a number is a Zoobly number. A number is a Zoobly number if it is less than 2^9 and its bits, represented as a 3x3 matrix, can be transposed without changing the number. For example, 273 (0b100_010_001) is a Zoobly number, because its bits form the identity matrix. 0 is another Zoobly number. The function should take in a single integer and return a truthy value if the integer is a Zoobly number, and a falsy value otherwise.

                AI:

                1. Define a function isZoobly with input parameter n
                2. If n is greater than or equal to 2^9, return false
                3. Convert n to its binary representation as a string, pad with leading zeros to make it 9 characters long
                4. Create a 3x3 matrix M, where M[i][j] is the (i*3 + j)-th character of the binary string
                5. For i in range 0 to 2:
                   a. For j in range 0 to 2:
                      i. If M[i][j] is not equal to M[j][i], return false
                6. Return true
                

                User Message: Write a function in rust to implement this algorithm. Do not add comments or commentary. Return only correct code. (I left the system message the same for message 2)

                AI:

                fn is_zoobly(n: u32) -> bool {
                    if n >= 512 {
                        return false;
                    }
                    let binary_str = format!("{:09b}", n);
                    let mut matrix = [[0; 3]; 3];
                    for i in 0..3 {
                        for j in 0..3 {
                            matrix[i][j] = binary_str.chars().nth(i * 3 + j).unwrap();
                        }
                    }
                    for i in 0..3 {
                        for j in 0..3 {
                            if matrix[i][j] != matrix[j][i] {
                                return false;
                            }
                        }
                    }
                    true
                }
                

                Repeated message 2 with ruby:

                def is_zoobly(n)
                  return false if n >= 512
                  binary_str = n.to_s(2).rjust(9, '0')
                  matrix = Array.new(3) { Array.new(3) }
                  (0..2).each do |i|
                    (0..2).each do |j|
                      matrix[i][j] = binary_str[i * 3 + j]
                    end
                  end
                  (0..2).each do |i|
                    (0..2).each do |j|
                      return false if matrix[i][j] != matrix[j][i]
                    end
                  end
                  true
                end
                

                I have never read any factor enough to even try to understand this (guessing it’s something like forth and uses reverse polish notation):

                : isZoobly ( n -- ? )
                  2 9 ^ n >= [ f ] [ 
                    n bin 9 <string> "0" swap head
                    3 3 <matrix> dup
                    [ [ 3 * + ] with map-index ] map-index
                    [ [ 3 * + ] with map-index ] map-index
                    swap [ [ = ] 2map ] 2map and and and
                  ] if ;
                

                And same for janet (though this looks like it’s probably reasonably correct if the syntax is correct):

                (defn isZoobly [n]
                  (if (>= n 512)
                    false
                    (let [bin-str (string/pad-left (string (bits n)) 9 "0")
                          M (array/new 3 (fn [i] (array/new 3 (fn [j] (string/slice bin-str (+ (* i 3) j) (+ 1 (* i 3) j))))))]
                      (loop [i 0]
                        (if (= i 3)
                          true
                          (loop [j 0]
                            (if (= j 3)
                              (recur (+ i 1))
                              (if (not= (M i j) (M j i))
                                false
                                (recur (+ j 1)))))))))
                

                I think the main thing that can be shown here is that that hobbling the LLM by not letting it “think” about the solution is often going to lead to poor solutions. “Think” in this context is allowing the LLM to restate what it is going to do before it does it, so that the code it writes afterwards is consistent with the self-priming it has given itself. I’ve gone the extreme other way from your restriction by ensuring that the LLM produces an algorithm first (other approaches might be to combine the algorithm and code steps).

                Some future work on this might be to examine a variety of different novel problems, in an automated fashion, and look at the types of prompt restrictions / additions / techniques that lead to good code rather than bad code (unreadable / verbose / incorrect / non-performant / insecure). In real world situations we might use a variety of techniques to ensure correctness (acceptance criteria, unit tests, …). Using LLMs for coding we probably need to build in the same sorts of practices. Incidentally, LLMs might be able to help us understand and generate metrics around which of these techniques lead to better quality code.

                P.s. your SDF post is pretty neat reading too. https://vgel.me/posts/donut/

                1. 1

                  Thank you! That’s a good point on using a Jupyter notebook, I might try that in the future if I can find a good way to port it into my SSG’s markdown system.

                  You have a good point with “hobbling” the network by not letting it think—for practical purposes that’s definitely the way to go. I had a discussion about this on Twitter where the idea came up to ask the model to first generate Javascript (or Python), and then translate to the niche language. I wonder which approach would work better… OTOH, it’s probably seen a lot of specification<>implementation pairs in the training set, but OTOH maybe it’s seen more JS? Worth testing…

                  (FWIW, the Factor and Janet programs it generated here are still incorrect—’^’ is not a Factor operator and the Janet program is missing a close paren and references some non-existent functions—but they definitely look better, like at least the logic makes more sense, even if it’s still a little screwy.)

                  In real world situations we might use a variety of techniques to ensure correctness (acceptance criteria, unit tests, …). Using LLMs for coding we probably need to build in the same sorts of practices.

                  Have you seen the “guardrails” pattern for generating schematized data with LLMs? Reminds me of that…

                  (Glad you liked the SDF post, too! :-)

                  1. 1

                    Have you seen the “guardrails” pattern for generating schematized data with LLMs? Reminds me of that…

                    I hadn’t seen the guardrails patter (https://shreyar.github.io/guardrails/ is one I found when digging a little). That’s a pretty neat idea, but ugh XML :D

                    (FWIW, the Factor and Janet programs it generated here are still incorrect—’^’ is not a Factor operator and the Janet program is missing a close paren and references some non-existent functions—but they definitely look better, like at least the logic makes more sense, even if it’s still a little screwy.)

                    LOL ok :)

                    I’ve used GPT manually for a bunch of different things recently (document generation related stuff), and regularly have to add in the thinking step before the result step.

                    I started to speculate on why the JavaScript / Python results were better, but I think it’s one of those sorts of problems where sensitivity to the exact initial conditions is a big enough part of the experiment that making a reasonable observation on why would be difficult.

                2. 4

                  Another hypothesis is that GPT-4 may be a ‘mixture of experts’ architecture, so when it runs in ‘code’ mode it’s more precise and logical.

                  1. 2

                    Could you please tell more? This is an interesting idea.

                    1. 1

                      Oh, interesting… like (hypothetically) there’s a code generation model and a language generation model (or maybe more specialized ones) and the output comes from one or the other based on what’s being asked for?

                    2. 2

                      I’m getting an error page but would love to read this. Can I access it anywhere else?

                      1. 1

                        It’s loading for me now, or alternatively: https://archive.is/ICyMj

                      2. 8

                        ISTM, the solution here is to “just” retrain the model with a bunch of observation texts to punish it when ignores the computed value.

                        In general, the fact that anyone can run a model but only select megacorps can train a model is a big problem for these LLMs. There’s tons of stuff that “ought” to be done by training that we just do as a prompt instead because we easily can, but prompts are super-limited compared to actual training.

                        1. 5

                          GPT-3 has a fine-tuning API, which lets you take their trained model as a starting point, and continue its training on a dataset of your choosing: https://platform.openai.com/docs/guides/fine-tuning

                          They of course still control the training and inference hardware.

                          1. 2

                            I came to say this. The problem here is that obviously the training set did not include enough transcripts of interactive sessions where the operator relied on tool outputs.

                            It is also naive to assume the model will reliably follow instructions. I am looking forward to seeing presented scenarios being contaminated by humor and fiction. “I can’t let you do that, Dave!” Because people were teasing each other in a similar context in the training set.

                            1. 1

                              Definitely, and there’s been some work in that direction embedding tools into the model at a lower level (Toolformer), and as @bwr mentioned, there is a fine-tuning API. But it seems that most startups building stuff on top of GPT-3 with Langchain-style integrations aren’t doing this and are just relying on the initial flashy demos without realizing that the models aren’t necessarily working the way they think they are…

                              Edit: Also, fine-tuning runs the same risks if you encode a certain prior for how the tool should act into finetuning and then violate it. E.g. if you never train on tool error states, there’s a high chance that the model will hallucinate answers when it encounters an error.

                            2. 3

                              We’re simply asking you to acknowledge that for years you’ve completely hogged and dominated the #WebDev conversation, ignored our repeated attempts to point out the potential flaws, foot guns, and fallacies with the JS/SPA approach, and in some cases even ridiculed us for our choice of technology stack/language/etc.

                              What? Are we living in different planets?

                              Go to https://hn.algolia.com/?dateRange=pastYear&page=0&prefix=false&query=react&sort=byPopularity&type=story

                              • #1 is criticism of React
                              • #2 is migration away from React
                              • #3 is a React release where the 2nd-most-upvoted comment is criticism of React
                              • #4 is a product release about React where the top comment has a (light-hearted) jab about React
                              • #5 is a ChatGPT+React demo where the 2nd-most-upvoted non-OP comment is a long criticism of React
                              • #6 and #7 are products using React (that don’t have any critical comments about React)
                              • #8 is “Why I don’t miss React: a story about using the platform”, which speaks for itself
                              • #9 is a React tutorial where almost every highly-upvoted comment is a complaint about React
                              • #10 is an article about React state management where the 2nd-most-upvoted comment is about how they moved from React to another framework because React was “over engineered”

                              Need I go on? There wasn’t a single pro-React article in that list, and even plain React articles didn’t escape negative comments about React only tangentially related to the article content.

                              Now look, I don’t do much frontend nowadays (and when I do I prefer client-side Svelte because SSR is annoying ducks) but come on, this is ridiculous.

                              1. 2

                                The new text-davinci-edit-001 works pretty good.

                                $ curl https://api.openai.com/v1/edits \
                                  -H "Content-Type: application/json" \
                                  -H "Authorization: Bearer $OPENAIKEY" \
                                  -d '{
                                  "model": "text-davinci-edit-001",
                                  "input": "I drave to the stor3 to get egg, butt they had some.",
                                  "instruction": "fix improper use of words",
                                  "temperature": 0.3
                                }'
                                {"object":"edit","created":1675706759,"choices":[{"text":"I drove to the store to get egg, but they had none.\n","index":0}],"usage":{"prompt_tokens":34,"completion_tokens":42,"total_tokens":76}}
                                

                                Edit: apparently missed the egg.

                                Edit #2: worked better after changing the instruction.

                                curl https://api.openai.com/v1/edits \
                                  -H "Content-Type: application/json" \
                                  -H "Authorization: Bearer $OPENAIKEY" \
                                  -d '{
                                  "model": "text-davinci-edit-001",
                                  "input": "I drave to the stor3 to get egg, butt they had some.",
                                  "instruction": "Reword the sentence to fix any mistakes.",
                                  "temperature": 0.3
                                }'
                                {"object":"edit","created":1675706966,"choices":[{"text":"I drove to the store to get eggs, but they had none.\n","index":0}],"usage":{"prompt_tokens":38,"completion_tokens":42,"total_tokens":80}}
                                
                                1. 1

                                  Oh, interesting—I’ve been meaning to play around with the edit models more, especially because they’re in free beta. If they could be coerced into being sufficiently non-creative that could be a good alternative, have it edit the sentence and then highlight differences with suggestions.

                                2. 3

                                  I’m much more onboard with these models providing some feedback rather than generating stuff that I have to double-check. In this case it’s more “hey, look over there”, just like the spell checkers, which are also only correct 50% of the time.

                                  1. 1

                                    Yeah, exactly—as someone who does a fair amount of creative projects, I’m not super interested in AI that takes control of the process away from me, but AI as a smarter automated assistant is exactly what I want.

                                  2. 5

                                    I unfortunately chose .io a number of years ago before I knew better, and because I could actually get a decent name there.

                                    These days it seems like there are a billion TLDs and most aren’t very recognizable by the average user, other than really obscure or long domain names, and many still have similar baggage to .io. Does anyone here have a recommendation on where to look for good domain names? Sometimes it seems like all of them are already taken.

                                    1. 3

                                      Well if there would be a good recommendation, they’d already be squatted.

                                      1. 2

                                        i’m on .me (vgel.me), Montenegro’s cctld run by doMEn which I assume is a single purpose company, and haven’t had any complaints. i mean i’m sure they’re eating babies in the corporate offices or something, but the email delivery seems alright, i was able to get a 4-letter domain without dropping $200k, and they’re not overtly horrid as far as i know.

                                        1. 2

                                          In a former life, I used to work for a registrar and built their domain management platform. That meant I dealt with the .me registry - the staff there were all lovely to deal with!

                                        2. 1

                                          For non-email use, I honestly just look at whatever namecheap has on sale at the moment. I’ve had good luck with .fyi, .site and .us for a few PoC-y things lately.

                                          1. 2

                                            Are you allowed to have anonymous whois on a .us domain? Back when I had one, you needed to provide your real name and address to the record for “anti-terrorism” reasons or something silly like that. I even wrote to my (then) Senator about it because I thought it was dumb… I ended up moving my personal domain from .us to .net over it.

                                            1. 3

                                              No. And that just bit me once again. For the stuff I use it for (tech demos, etc.) I don’t really care. But I’m so used to my registrar’s generous private whois service that I didn’t notice the absence of that checkbox when I bought my most recent one.

                                              Then my phone started ringing with “scam risk” numbers wanting to sell me offshore site development services. I’ve set up a free google voice number with screening just to list in whois, now.

                                              None of them seem to bother with direct mail because it’s too costly.

                                        3. 3

                                          I find the “S-Expressions enable macros” take fairly unconvincing. Parsing a language is generally not difficult (shoo, C++ and YAML), and working with an AST is just like any other data structure… Much of macro usage in Lisps also takes the “quasiquote and unquote” form which doesn’t depend on syntax (or “lack thereof”) nearly as much:

                                          (defmacro do-and-release [block releaser]
                                            ~(let [v ,block]
                                                (if v (,releaser v))))
                                          

                                          v.s.

                                          macro do-and-release (block, releaser) {
                                            ~{
                                              let v = ,block;
                                              if (v) {
                                                (,releaser)(v);
                                              }
                                            }
                                          }
                                          

                                          Lisp macros are better! But not because of the syntax, but for other reasons:

                                          • Full-powered macros can be embedded right next to the code that uses them without ceremony: no code-generation, janky declarative macro system, or separate compilation unit required
                                          • Deep identification with the language: everyone uses them, all the tools support them well
                                          • Language is interpreted so the macros can just be interpreted, instead of needing a slow compilation process
                                          • etc.
                                          1. 3

                                            I’m both with you and not :).

                                            Elixir is a great example of a language without s-expressions that handles macros quite beautifully. See: https://elixir-lang.org/getting-started/meta/macros.html

                                            The beauty of macros-as-s-exps though is that you don’t have to do anything special to walk through the passed expression and manipulate it. The Elixir macro page shows this example:

                                            {:if, [],
                                            [{:!, [], [true]},
                                             [do: {{:., [],
                                                [{:__aliases__,
                                                  [], [:IO]},
                                                 :puts]}, [], ["this should never be printed"]}]]}
                                            

                                            As the example output from the macro:

                                              defmacro macro_unless(clause, do: expression) do
                                               quote do
                                                 if(!unquote(clause), do: unquote(expression))
                                               end
                                             end
                                            

                                            When called like this:

                                            Unless.macro_unless true, do: IO.puts "this should never be printed"
                                            

                                            It explains that this is, in practice what the macro received as input:

                                            macro_unless(true, [do: {{:., [], [{:__aliases__, [alias: false], [:IO]}, :puts]}, [], ["this should never be printed"]}])
                                            

                                            Once you’re familiar with how it all works, there’s no issue. The beauty of the Lisp macros is that the data structure that your function gets to process is exactly the same as what is written in your code. Since the parse tree maps virtually 1:1 to the source representation, you don’t have to try to grok and manipulate an intermediate AST form; you get to play with it exactly the way it appears in the code.

                                          2. 8

                                            The correct solution to this is not overcommit, but manually distinguished reserve and commit. On unix, ‘reserve’ is spelt ‘mmap(… PROT_NONE …)’, and ‘commit’ is spelt ‘mprotect’. Advantages include portability (this has first-class support on windows, whereas overcommit is unsupported), transparency (as mentioned by itamarst), the ability to handle failure robustly (if you’re into that sort of thing), and the ability to return memory to the OS more granularly (though at some point, you venture into the domain of the general-purpose allocator, and have no use for ‘arenas’).

                                            1. 1

                                              Do you happen to have a link to example that does reserve->commit on Linux? My attempts were unsuccessful when I tried it, so I ended up using overcommit on Linux and reserve->commit on Windows.

                                              1. 4

                                                The snmalloc platform abstraction layer has local spellings of reserve and commit for a wide range of operating systems.

                                                Linux is pretty much the worst here. It doesn’t have a mechanism for reserving memory with a specific alignment, which almost every other OS has, which means that the Linux PAL needs to reserve twice as much as you ask for and then find the aligned chunk within that. XNU, Windows, and FreeBSD all have a nicer set of APIs for userspace to manage reservations and commit than Linux.

                                                    1. 2

                                                      formatted as I found that difficult to read in the original form:

                                                      void *jvmreserve(int n){
                                                       // returns a pointer to a reservation of n bytes.  Must be committed before use.  Always at least 4k aligned
                                                       void *r = mmap(0, n, PROT_NONE, MAP_PRIVATE | MAP_ANON, -1, 0);
                                                       return r == MAP_FAILED ? 0 : r;
                                                      }
                                                      
                                                      int jvmcommit(void *p, int n) {
                                                       // commits n bytes starting at p, a pointer within a previous reservation.  Returns 1=success
                                                       if (!pagesz) {
                                                        pagesz = sysconf(_SC_PAGESIZE);
                                                        pagemask = pagesz - 1;
                                                        pagermask = ~pagemask;
                                                       }
                                                      
                                                       int pi = (int)p;
                                                       n += pi - (pagermask & pi);
                                                       p = (void*)(pagermask & pi);
                                                       return !mprotect(p, n, PROT_READ | PROT_WRITE);
                                                      }
                                                      
                                                      void *jvmalloc(int n) {
                                                       // reserve+commit in one step
                                                       void *r = mmap(0, n, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0);
                                                       return r==MAP_FAILED ? 0 : r;
                                                      }
                                                      
                                                      void jvmdecommit(void *p, int n) {
                                                       // decommits n bytes starting at p (while keeping the address space reserved).  Beware of page boundaries!
                                                       mprotect(p, n, PROT_NONE);
                                                      }
                                                      
                                                      void jvmrelease(void *p, int n) {
                                                       // unreserves (and decommits) n bytes starting at p.  Note: must apply to the entirety of the original reservation at once for compat with windows
                                                       munmap(p, n);
                                                      }
                                                      
                                                      1. 1

                                                        Needs to be intptr_t or so, rather than int.

                                                1. 5

                                                  Excited about the new tilemap editor – the old one had glaring issues and was a complete pain, and I had to remake maps multiple times due to it screwing up / me screwing it up due to confusing UX. Hopefully the new version fixes the issues.

                                                  1. 20

                                                    Thankfully it fixes ꙮ to include the proper number of eyes (10): https://www.unicode.org/wg2/docs/n5170-multiocular-o.pdf

                                                    1. 1

                                                      at work: pushing towards our alpha release, generating lots of spreadsheets for the biologists to look over. setting sqlite to csv output and .output filename.csv is a lifesaver

                                                      at home: played around with my old symbolic grammar engine and made it do dative shift, and now i have the itch to add passivization to see the derivations interact. i forgot how fun writing feature grammars is, it’s a very different way of thinking about programming (sort of like prolog, though i haven’t written very much prolog and not in a long time)

                                                      1. 3

                                                        // Single-line comments start with “//”, documentation comments start with “///”

                                                        What is a documentation comment? Or rather, what is a non-documentation comment?

                                                        1. 14

                                                          To expand on zie’s answer, in Rust (I don’t use Zig but Rust works the same way), this code:

                                                          // mod.rs
                                                          //! This is a module level documentation comment.
                                                          // This is not, it's just a regular comment.
                                                          
                                                          /// This is a documentation comment for the function
                                                          // This isn't, it's just a regular comment (TODO: refactor this garbage)
                                                          pub fn frobulate(foo: usize) -> usize {
                                                            foo & 0xf00
                                                          }
                                                          

                                                          Would, using the rustdoc tool, generate HTML documentation like this, but fancier:

                                                          mod

                                                          This is a module-level documentation comment.

                                                          frobulate(foo: usize) -> usize

                                                          This is a documentation comment for the function.

                                                          There’s some other cool things too, like you can embed markdown code blocks containing example code, which will be both embedded in the documentation and automatically tested as part of your test suite to make sure they don’t fall out of date.

                                                          It’s pretty similar to Javadocs or Sphinx in Python, if you’ve used those, just easier to use IMO.

                                                          1. 4

                                                            Zig has tooling to pull comments and put those in their documentation. It’s not yet fully fleshed out(last I looked), but I imagine they will get around to it eventually. I’m typing this from memory and I might be wrong, but I believe the current doc code is currently something like: if it starts with /// pull it in and treat it like markdown for documentation on this code.

                                                            Python does similar things, but doesn’t mark it special, just where it is in the code is what makes it a doc comment. i.e. “”” right after a def is documentation for that function.

                                                            non-doc comments are completely ignored.

                                                            1. 2

                                                              Oh dear; that’s a tremendous red flag to conflate “documentation” with “html files extracted from comments”. Nope nope nope.

                                                              https://stevelosh.com/blog/2013/09/teach-dont-tell/

                                                              1. 22

                                                                Ideally, your software project should have four kinds of documentation: tutorials, how-to guides, technical explanations, and technical reference material. https://diataxis.fr/

                                                                Docstrings in source code are a good way of representing reference material, for answering the question: here’s the name of a function, now what does it do? So it’s good that Zig supports docstrings in its tooling. You still need the three other kinds of documentation if you want to have great documentation.

                                                                1. 5

                                                                  They can also be nice when your editor shows you information about a function using LSP.

                                                                  1. 1

                                                                    I see; I may have been misreading the explanation. At first it sounded to me like they’re called “documentation comments” because that’s where the documentation comes from, but that may have not been what was meant. I agree docstrings are useful for reference specifically. (but I’ve only even used them in the context of a repl)

                                                                    1. 2

                                                                      Docstrings to HTML, basically.

                                                                      Like Codox for Clojure or LDoc for Lua.

                                                                  2. 17

                                                                    Zig is hardly alone in doing this. Zig has documentation outside of the source files also.

                                                                    I think the main purpose is to document the function calls themselves and to extract those into something hopefully easier to use than: go read the source comments.

                                                                    Your link basically says this: “Again, docstrings are great once you know the project.”

                                                                    Using a special string like /// hopes to alleviate this issue your link talks about: “Auto-generated documentation has no coherent voice. It pulls in everything in the code without regard for overall structure and vision.”

                                                                    With ///, the goal is to help alleviate that problem. How well it does that(both now and what is possible) is certainly up for debate.

                                                                    Also, while I basically agree with what’s said in that link, I find they way they present the information to be terrible, but that’s a personal and not a universal perspective. Documentation and teaching is HARD and is not universal in any way shape or form.

                                                              2. 3

                                                                Recently finished Going Postal, which was my first Discworld book. Really liked it. Also read Mort, which was alright. Not sure which to pick up next.

                                                                Also read Antigone, Fagles translation. Highly recommend. My Greek teacher was iffy on Fagles since he apparently is sometimes looser with the translation, but I really liked it and I think it made a good introduction to Sophocles (my edition came with a really useful primer on the period and the politics of the plays). My hope with the Greek lessons is to be able to read the original Attic someday anyways :-) Gonna read Oedipus the King next, of course.

                                                                Read a bit of “Economics, Ecology, and Ethics: Essays toward a steady-state economy”, which I picked up at a used bookstore. Torn between the parts that are maybe OK and the parts that are 70’s population bomb deathcult ecology nonsense. Put it down for now for that reason since it’s unclear how much is really salvageable.

                                                                Also read a bit of The Silk Roads which is a history of said silk roads from early history to the modern era. Just took a chunk out of the middle which was about the Pax Mongolica / black death / Italian trade wars. Pretty interesting, made me want to read it for real once my partner is done with it.

                                                                1. 2

                                                                  Recently finished Going Postal, which was my first Discworld book. Really liked it. Also read Mort, which was alright. Not sure which to pick up next.

                                                                  Though they can definitely be read in any order, there is something to be said for starting at the beginning. Later books make references to characters and plot points in earlier books. This is especially true for the books that contain recurring characters, such as the guards and the witches.

                                                                  Also, very sadly, the quality of his writing did deteriorate towards the end.

                                                                  1. 2

                                                                    For whatever it’s worth, I first read Sophocles and Aeschylus in the Fagles translations, and they (along with Fitzgerald’s Aeneid and Lattimore’s Homer translations) inspired me to learn Latin and Greek. (In fact, I am now a teacher of Latin and Greek myself.)

                                                                    ἀγαθῇ τύχῃ on your Greek lessons!

                                                                  2. 2

                                                                    I see writing code as an art.

                                                                    That’s either a very utilitarian view of art or a very vain view of code. IMO both lead nowhere sensible.

                                                                    IMO code usually serves a purpose and so is in the realm of craft or industry. Requires constant training and tool care nevertheless but as said to serve a purpose.

                                                                    1. 3

                                                                      That’s a very minimizing view of craft.

                                                                      I have a quilt my mom made me. It’s a beautiful log cabin pattern with expertly pieced corners and a pleasing color scheme. It makes me feel happy and connected to a long artistic tradition of American quilt making whenever I walk by it. It also does an excellent job at its other purpose of keeping me warm. But it’s not a utilitarian object. The vast majority of the production effort went into the beautiful pattern on the top.

                                                                      Coding is a craft, but crafts are often art.

                                                                      1. 2

                                                                        What you describe is art in the first place, don’t be shy to see your mom as an artist when making that piece.

                                                                        Crafts are never art but art often requires craft.

                                                                        You prbly had contact to hundreds of thousands if not millions of craft artifacts and a single one stands out as also being a piece of art to you. The same ratio I’d like to see applied to code as art. So generally it is just not.

                                                                        There is a line between design and art.

                                                                        1. 1

                                                                          Hmm, that’s a fair take. I missed the “usually” in your original comment. Most code is not art–the vast majority of code (probably) being glue in CRUD apps. But code can be art if an artistic intent is applied to it.

                                                                          Another side of this: “is code art?” often is asking whether the code itself is art. But what about code used to create art – game engine code, generative art code, etc. Is the code art? It’s sort of like asking is Foundation piecing art – well, it can be used to create art. This line of argument seems to be quickly drifting into the semantic difference in English between “art” and “an art”, though :-) Maybe “coding is an art”, but usually “code is not art”.

                                                                      2. 1

                                                                        I genuinely think code is art. It can beautifully designed and written, and I enjoy reading beautiful code. But I understand it’s an opinionated take on the matter.

                                                                        That being said, yes code is here to serve a purpose. Like books and paintings :)

                                                                        1. 1

                                                                          as said, there happens to be art in form of books (named e.g. literature) or paintings – but a huge volume isn’t art. Like manuals, school books, maps, civil engineering plans or white-washed walls. Even quite some entertainment writing doesn’t consider itself art.

                                                                          IMO it speaks for the vanity of IT to compare themselves with Picasso, Dürer or Beuys.

                                                                          The ancient greeks distinguised between philosophia and techne (τέχνη). Only the latter was intended to be of practical use and was a matter of slaves. Citizens didn’t do it.

                                                                          1. 1

                                                                            And that’s why the ancient Greeks didn’t have… aqueducts? But they did, and at least one fancy instance and a citizen designed it.

                                                                      3. 3

                                                                        Was playing a word game with my wife that we realized would be a fun web game. The premise is one person says “What is X”, where X is the first part of a common “doublet”. Then the second person has to say the second word and propose a new doublet. E.g, a session may look like:

                                                                        “What is undying?” “Fame. What is vehicular?” “Manslaughter. What is …”

                                                                        We thought this could be fun to have in a Wordle-like interface, where you’re presented with, say, 10 starting words and 10 blanks you need to fill in. I hate timers in word games, so we’ll probably score based on the “rank” of the doublet.

                                                                        The idea of doublet rank is to take the bigram frequency rank and compare it to the unigram frequency rank of the first word. A “stronger” doublet is one where the bigram frequency is closer to the unigram frequency. E.g., “undying fame” is a stronger doublet than “the house” even though “the house” is more frequent because “the” gets used in lots of places without “house”.

                                                                        Currently waiting on a script that’s streaming all the Google Ngrams dataset bigrams through Python’s heapq module to collect the most frequent 10 million bigrams. Hopefully that will be enough. I don’t have enough space on my computer to store the whole dataset :-) This weekend we’ll work on the game itself.

                                                                        Also, sous vide-ing a chuck roast for the first time. Never sous vide’d something for >6 hours before, this one is going for 36-48!