Yeah Knuth-style tangle/weave definitely shouldn’t be allowed a monopoly on the very good idea of interleaving prose and code. https://github.com/arrdem/source/tree/trunk/projects/lilith/ was my LangJam submission that went in that direction, partly inspired by my previous use of https://github.com/gdeer81/marginalia and various work frustrations at being unable to do eval() in the ReStructuredText files that constitute the majority of my recent writing.
Technically, I think, it’d be “documentation generation” because it doesn’t involve any tangle or weave steps.
because these tools do not implement the “web of abstract concepts” hiding behind the system of natural-language macros, or provide an ability to change the order of the source code from a machine-imposed sequence to one convenient to the human mind, they cannot properly be called literate programming tools in the sense intended by Knuth.
A lot of Literate Programming systems are based on the idea of automatically copy-pasting pieces of code around. I think this is a terrible idea for many of the same reasons why building a program entirely out of C macros is a terrible idea. However, it’s a necessary evil due to the limitations of C; everything has to be in a particular order imposed by the C compiler. If you want your prose to describe an implementation of a function first, and then describe the struct which that function uses on later, the only solution in C is to do macro-like copy/paste to make the generated C code contain the struct definition before the function even though it’s implemented after the function.
Many modern languages don’t have this limitation. Many languages lets everything refer to everything else regardless of the order they appear in the file. Thanks to this, I think we can generally treat the function as the unit of code in literate programming systems, and we don’t need elaborate automatic copy/paste systems. As a bonus, all the interactions between code snippets follows simple, well-understood semantics, and you don’t have the common issues with invisible action at a distance you see in many literate programming systems.
That’s the basis for our submission at least, where we made a literate programming system (with LaTeX as the outer language) where all the interaction between different code snippets happens through function calls, not macro expansions.
Maybe a controversial opinion, but linear text is way inferior than how we usually read, comprehend, and think about code. We (or I, at least) tend to think in a graph-like way, nodes, connections, flows, etc. Code is actually close to this than prose, and we should probably be moving in the opposite direction, with more graphical and spacial representations.
I agree. While this looks neat and makes the program accessible to a lot more people, my first thought was how I would debug this? First thing I would want to do is to strip the “code” part from the “text” part to see it in isolation, to see what the computer is “actually” doing.
Also, any serious program would become the size of a novel. And as complex and intertwined as a novel. That would really not be accessible, I think.
This is not quite what Knuth conceived of as Literate Programming (where tangling and weaving are the critical concepts that are missing here), but this is extremely similar to Literate Haskell.
The one time I tried working on a project that did it the Knuth way (with cweb), it was miserable to refactor in. I’m not against literate programming, but it wasn’t a good first impression.
I think that’s an overly narrow definition of LP, which Knuth himself has defined like this:
In literate programming the emphasis is reversed. Instead of writing code containing documentation, the literate programmer writes documentation containing code.
By this definition OP is a purer form of LP than Web, which hacked two languages together just to reduce implementation effort. Minimizing syntax allows the reader to focus on the content.
The problem: reading prose is quite a bit slower than well written code. Editing sentences/paragraphs of text will is time consuming (think refactoring). Rarely is code so linear for the text to fix neatly around it like in simple examples.
The way a well written code (the “normal” way) is supposed to be, is not all that far from this concept, except that the structure is more important, and sentences/paragraphs of comments are attached somewhere in the structure. Names of identifiers should carry a lot of self-documenting meaning, types etc. describe constrains and inputs&outputs, etc.
I really hate syntactically significant white space. Not because it’s a bad idea, but because to this day nobody can really agree on spaces vs. tabs, or tab width.
This is a great convergence between notebooks and literate programming. ObservableHQ notebooks are the closest thing to that, and were a game changer for me.
Instead of commenting on what the preconditions are, put in a precondition assert.
If the precondition is weird and hard to express… simplify the code so it isn’t.
Instead of a long winded explanation or what the a function does, well written unit tests structured as executable documentation of how to use it and what it does.
Yep. Tests as docs aren’t really great by themselves but they work really well up and to that point. I put whys in comments (recently ADRs - this is a boundary for me). The why I’m doing this never bit-rots. I can get stability and regression testing from many layers and flavors of testing.
Doctests or quicktests are fine too. But it’s just a different syntax and location. Comments aren’t human independent sensors. Types are tiny validators like tests in a way. This is all part of the same theme.
If I had infinite time, my human (only) readable documentation would be beautiful. But given a finite time simplifying the code and creating better executable documentation usually wins me most bang for time spent.
As you say, the “Why’s” are mostly what my comments are about if I cannot make the why obvious.
One practice on the “Why’s” I have adopted. if a magic number comes from say, a datasheet, I pull the datasheet from the ’net, commit it to a 3rd party documentation only repo with a commit message giving the original source url, and in the code put a comment giving the url to data sheet in the doc repo.
This is important as datasheets fall off the ’net, datasheets get updated so the magic number you read might not be the magic number in the latest version of the datasheet.
That’s a good tip. Yeah, I’ll link even to a SO discussion, docs or something. It’s tricky though. Some of my comments don’t even read great to me later. I just think about time passing often. This thing is going to travel through time.
You’ll be pleased to hear this concept has a name already: literate programming.
That’s just the author’s particular take on this. I’ve seen other takes that are quite different from plain old literate programming.
Yeah Knuth-style tangle/weave definitely shouldn’t be allowed a monopoly on the very good idea of interleaving prose and code. https://github.com/arrdem/source/tree/trunk/projects/lilith/ was my LangJam submission that went in that direction, partly inspired by my previous use of https://github.com/gdeer81/marginalia and various work frustrations at being unable to do
eval()
in the ReStructuredText files that constitute the majority of my recent writing.Technically, I think, it’d be “documentation generation” because it doesn’t involve any tangle or weave steps.
[my emphasis]
My view of this is:
A lot of Literate Programming systems are based on the idea of automatically copy-pasting pieces of code around. I think this is a terrible idea for many of the same reasons why building a program entirely out of C macros is a terrible idea. However, it’s a necessary evil due to the limitations of C; everything has to be in a particular order imposed by the C compiler. If you want your prose to describe an implementation of a function first, and then describe the struct which that function uses on later, the only solution in C is to do macro-like copy/paste to make the generated C code contain the struct definition before the function even though it’s implemented after the function.
Many modern languages don’t have this limitation. Many languages lets everything refer to everything else regardless of the order they appear in the file. Thanks to this, I think we can generally treat the function as the unit of code in literate programming systems, and we don’t need elaborate automatic copy/paste systems. As a bonus, all the interactions between code snippets follows simple, well-understood semantics, and you don’t have the common issues with invisible action at a distance you see in many literate programming systems.
That’s the basis for our submission at least, where we made a literate programming system (with LaTeX as the outer language) where all the interaction between different code snippets happens through function calls, not macro expansions.
Literate programming was the inspiration for my team’s submission: https://github.com/mortie/lafun-language
Maybe a controversial opinion, but linear text is way inferior than how we usually read, comprehend, and think about code. We (or I, at least) tend to think in a graph-like way, nodes, connections, flows, etc. Code is actually close to this than prose, and we should probably be moving in the opposite direction, with more graphical and spacial representations.
I agree. While this looks neat and makes the program accessible to a lot more people, my first thought was how I would debug this? First thing I would want to do is to strip the “code” part from the “text” part to see it in isolation, to see what the computer is “actually” doing.
Also, any serious program would become the size of a novel. And as complex and intertwined as a novel. That would really not be accessible, I think.
This is really neat.
This is not quite what Knuth conceived of as Literate Programming (where tangling and weaving are the critical concepts that are missing here), but this is extremely similar to Literate Haskell.
The one time I tried working on a project that did it the Knuth way (with cweb), it was miserable to refactor in. I’m not against literate programming, but it wasn’t a good first impression.
WEB/CWEB has, in my opinion, an upper length to the programs it can comfortably express, and that limit is pretty small.
It also doesn’t work well with modularization, IMHO.
Then again, Donald Knuth wrote CWEB/TeX/TAOCP and I’m writing a comment on Lobste.rs so what do I know.
Yeah, the program I was working with only had a single file under CWEB. I basically did what I could to avoid touching it.
I think that’s an overly narrow definition of LP, which Knuth himself has defined like this:
By this definition OP is a purer form of LP than Web, which hacked two languages together just to reduce implementation effort. Minimizing syntax allows the reader to focus on the content.
The problem: reading prose is quite a bit slower than well written code. Editing sentences/paragraphs of text will is time consuming (think refactoring). Rarely is code so linear for the text to fix neatly around it like in simple examples.
The way a well written code (the “normal” way) is supposed to be, is not all that far from this concept, except that the structure is more important, and sentences/paragraphs of comments are attached somewhere in the structure. Names of identifiers should carry a lot of self-documenting meaning, types etc. describe constrains and inputs&outputs, etc.
RetroForth has this as Unu (see the docs, section ‘Unu: Simple, Literate Source Files’)
Also, this story should be tagged as being authored by the OP.
I really hate syntactically significant white space. Not because it’s a bad idea, but because to this day nobody can really agree on spaces vs. tabs, or tab width.
Seems a bit similar to https://soulver.app/
https://numbr.dev
Is there an HTTPS version of this URL? I have HTTP disabled.
Click the “cached” link under the submission
This is a great convergence between notebooks and literate programming. ObservableHQ notebooks are the closest thing to that, and were a game changer for me.
I prefer executable comments.
ie. asserts and unit tests.
Instead of commenting on what the preconditions are, put in a precondition assert.
If the precondition is weird and hard to express… simplify the code so it isn’t.
Instead of a long winded explanation or what the a function does, well written unit tests structured as executable documentation of how to use it and what it does.
Yep. Tests as docs aren’t really great by themselves but they work really well up and to that point. I put whys in comments (recently ADRs - this is a boundary for me). The why I’m doing this never bit-rots. I can get stability and regression testing from many layers and flavors of testing.
Doctests or quicktests are fine too. But it’s just a different syntax and location. Comments aren’t human independent sensors. Types are tiny validators like tests in a way. This is all part of the same theme.
If I had infinite time, my human (only) readable documentation would be beautiful. But given a finite time simplifying the code and creating better executable documentation usually wins me most bang for time spent.
As you say, the “Why’s” are mostly what my comments are about if I cannot make the why obvious.
One practice on the “Why’s” I have adopted. if a magic number comes from say, a datasheet, I pull the datasheet from the ’net, commit it to a 3rd party documentation only repo with a commit message giving the original source url, and in the code put a comment giving the url to data sheet in the doc repo.
This is important as datasheets fall off the ’net, datasheets get updated so the magic number you read might not be the magic number in the latest version of the datasheet.
That’s a good tip. Yeah, I’ll link even to a SO discussion, docs or something. It’s tricky though. Some of my comments don’t even read great to me later. I just think about time passing often. This thing is going to travel through time.
Sounds similar to PHP. [ducks]