1. 8

    Nice writeup! I’m glad to see that Zig’s compile time metaprogramming is carrying its weight. It seems like a great thing to base a language around, and something I’ve been interested in for a long time.

    It’s interesting to compare that with: https://nim-lang.org/araq/v1.html

    … Nim’s meta programming capabilities are top of the class. While the language is not nearly as small as I would like it to be, it turned out that meta programming cannot replace all the building blocks that a modern language needs to have.

    I don’t know why that is (since I don’t know Nim), but of course it’s a hard problem, and it looks like Zig has done some great things here.


    If this is true in general, a plausible reason for this difference is that many of the ‘zero-cost’ abstractions that are heavily used in rust (eg iterators) are actually quite expensive without heavy optimization.

    I’m also finding this with “modern” C++ … A related annoying thing is that those invisible / inlined functions are visible in the debugger, because they may need to be debugged!

    1. 7

      Related wish: I kinda want an application language with Zig-like metaprogramming, not a systems language. In other words, it has GC so it’s a safe language, and no pointers (or pointers are heavily de-emphasized).

      Basically something with the abstraction level of Kotlin or OCaml, except OCaml’s metaprogramming is kinda messy and unstable.

      (I’m sort of working on this, but it’s not likely to be finished any time soon.)

      1. 6

        Julia has similar ideas. There is a bit more built in to the type-system eg multimethods have a fixed notion of type specificity, but experience with julia is what makes me think that zig’s model will work out well. Eg: https://scattered-thoughts.net/writing/zero-copy-deserialization-in-julia/ , https://scattered-thoughts.net/writing/julia-as-a-platform-for-language-development/

        1. 4

          Yeah Julia is very cool. I hacked on femtolisp almost 5 years ago as a potential basis for Oil, because I was intrigued how they bootstrapped it and used it for the macro system. (But I decided against writing a huge parser in femtolisp).

          And recently I looked at the copying GC in femtolisp when writing my own GC, which is one of the shortest “production” usages of the Cheney algorithm I could find.

          And I borrowed Julia’s function signature syntax – the ; style – for Oil.

          But unfortunately I haven’t gotten to use Julia very much, since I haven’t done that type of programming in a long time.


          That said, I’d be very interested in a “Zig for language development” post to complement these … :) Specifically I wonder if algebraic data types are ergonomic, and if Zig offers anything nice for those bloated pointer-rich ASTs …

          i.e. I have found it nice to have a level of indirection between the logical structure and the physical layout (i.e. bit packing), and it seems like Zig’s metaprogramming could have something to offer there. In contrast, Clang/LLVM do tons of bit packing for their ASTs and it seems very laborious.

          1. 3

            wonder if algebraic data types are ergonomic

            Aside from the lack of pattern matching, they’re pretty good. There are a couple of examples in the post of nice quality of life features like expr == .Constant for checking the tag and expr.Constant for unwrap-or-panic. Comptime reflection makes it easy to generate things like tree traversals.

            Zig offers anything nice for those bloated pointer-rich ASTs

            I mostly work in database languages where the ast is typically tiny, but if you have some examples to point to I could try to translate them.

            a “Zig for language development” post

            I definitely have plans to bring over some of the query compiler work I did in julia but that likely won’t be until next year.

        2. 6

          Take a look at Nim. It has GC (now ref-counted in 1.4, with a cycle collector) and an excellent macro facility.

          1. 4

            Nim is impressive, and someone is actually translating Oil to Nim as a side project …

            http://www.oilshell.org/blog/2020/07/blog-roadmap.html#how-to-rewrite-oil-in-nim-c-d-or-rust-or-c

            I tried Nim very briefly, but the main thing that turned me off is that the generated code isn’t readable. Not just the variable names, but I think the control flow isn’t preserved. Like Nim does some non-trivial stuff with a control flow graph, and then outputs C.

            Like Nim, I’m also generating source code from a statically typed language, but the output is “pidgin C++” that I can step through in the debugger, and use with a profiler, and that’s been enormously helpful. I think it’s also pretty crucial for distro maintainers.

          2. 5

            I find D’s approach to metaprogramming really interesting, might be worth checking out if you’re not familiar with it.

            1. 5

              D’s compile-time function execution is quite similar. Most of the zig examples would work as-is if translated to d. The main difference being that in d, a function cannot return a type; but you can make a function be a type constructor for a voldemort type and produce very similar constructions.

              1. 3

                Yeah I have come to appreciate D’s combination of features while writing Oil… and mentioned it here on the blog:

                http://www.oilshell.org/blog/2020/07/blog-roadmap.html#how-to-rewrite-oil-in-nim-c-d-or-rust-or-c

                Though algebraic data types are a crucial thing for Oil, which was the “application” I’m thinking about for this application language … So I’m not sure D would have been good, but I really like its builtin maps / arrays, with GC. That’s ilke 60% of what Oil is.

                1. 1

                  D does have basic support for ADTs (though there’s another better package outside the standard library). Support is not great, compared with a proper ml; but its certainly no worse than the python/c++ that oil currently uses.

              2. 3

                Julia sort of fits, depends on your applications. Metaprogramming is great and used moderately often throughout the language and ecosystem. And the language is fantastically expressive.

                1. 2

                  I want this too, got anything public like blog posts on your thoughts / direction?

                  1. 3

                    Actually yes, against my better judgement I did bring it up a few days ago:

                    https://old.reddit.com/r/ProgrammingLanguages/comments/jb5i5m/help_i_keep_stealing_features_from_elixir_because/g8urxou/

                    tl;dr Someone asked for statically typed Python with sum types, and that’s what https://oilshell.org is written in :) The comment contains the short story of how I got there.

                    The reason I used Python was because extensive metaprogramming made the code 5-7x shorter than bash, and importantly (and surprisingly) it retains enough semantic information to be faster than bash.

                    So basically I used an application language for a systems level task (writing an interpreter), and it’s turned out well so far. (I still have yet to integrate the GC, but I wrote it and it seems doable.)


                    So basically the hypothetical “Tea language” is like statically typed Python with sum types and curly braces (which I’ve heard Kotlin described as!), and also with metaprogramming. Metaprogramming requires a compiler and interpreter for the same language, and if you squint we sorta have that already. (e.g. the Zig compiler has a Zig interpreter too, to support metaprogramming)

                    It’s a very concrete project since it’s simply the language that Oil is written in. That is, it already has 30K+ lines of code written for it, so the feature set is exactly mapped out.

                    However, as I’ve learned, a “concrete” project doesn’t always mean it can be completed in a reasonable amount of time :) I’m looking for help! As usual my contact info is on the home page, or use Github, etc.

                    Another way to think of this project is as “self hosting” Oil, because while the current sets of metalanguages is very effective, it’s also kind of ugly syntactically and consists of many different tools and languages. (Note that users are not exposed to this; only developers. Tea may never happen and that’s OK.)

              1. 3

                Good read! I’m more convinced than ever that Rust is right for me :)

                Rust catches overflow in debug and wraps in release. Zig catches overflow in debug/release-safe and leaves behavior undefined in release-fast.

                Zig aspires to insert runtime checks for almost all undefined behavior when compiling in debug mode.

                I never liked this debug/release mode distinction. IMO, unless you’re writing code targeting some very specific resource constrained environment or maybe a hyper optimized loop, stuff like assertions (and rust panics) should be left on also in release mode. A core dump with a tripped assertion is so much easier to dig into than trying to figure out a consequent crash (or silent data loss!) due to broken invariants.

                Rust prevents having multiple mutable references to the same memory region at the same time. This means that eg iterator invalidation is prevented at compile time …. Similarly for resizing a data-structure while holding a reference to the old allocation. Both examples are easy sources of UAF in zig.

                In rust the Send/Sync traits flag types which are safe to move/share across threads. In the absence of unsafe code it should be impossible to cause data races. Zig has no comparable protection.

                This is for me maybe the biggest point of Rust. We subject ourselves to the borrow-checker just to get the guarantees of compile time ensured safe code. If I don’t have that guarantee, I’d rather go all the way to some lush GC language.

                1. 8

                  Good read! I’m more convinced than ever that Rust is right for me :)

                  That’s not a bad outcome. At least it was informative :)

                  I never liked this debug/release mode distinction.

                  I agree. I’ve been using release-safe for everything in zig, which has the same checks as debug mode. I wouldn’t object to renaming release-fast to release-unsafe. Or release-yolo.

                  This is for me maybe the biggest point of Rust.

                  It is a huge innovation. I think zig has also made a huge innovation on a mostly orthogonal axis. There is a lot to be learned from both, especially if we can figure out a way to combine their powers.

                  1. 3

                    especially if we can figure out a way to combine their powers

                    FWIW both Swift and D are looking at integrating ownership or “static” memory management… way after the fact.

                    I guess my issue is less whether it’s possible to bolt on e.g. to Zig, and more whether it will be a good experience and retain the simplicity of the language…

                    https://github.com/apple/swift/blob/main/docs/OwnershipManifesto.md

                    https://dlang.org/blog/2019/07/15/ownership-and-borrowing-in-d/

                    1.  

                      Haskell also has a linear type proposal: https://gitlab.haskell.org/ghc/ghc/-/wikis/linear-types

                      1.  

                        Its already merged and will be in 8.12 https://www.tweag.io/blog/2020-06-19-linear-types-merged/ well the first iteration at least.

                        Note that linear types in Haskell != affine types in Rust

                1. 5

                  I don’t like writing more than 500-1000 lines of C++ by hand, but I like generating it. It has some nice properties for code generation: https://news.ycombinator.com/item?id=24052268

                  The Souffle Datalog Compiler makes good use of it (used to prototype Rust’s type system)

                  1. 1

                    If something can deterministically be generator from whatever format you input your information,behy not making that format the first class citizen? What is the point of the intermediate format, I. This case C++.

                    In the mid 2000s, all thre rage among java developers was how code generation would be all end all. That failed miserably as it became clear it was just a symptom of syntax obesity. The same applies to c++.

                    1. 2

                      Many, many reasons:

                      • the intermediate format is widely understood by tools, while the source format isn’t (in this case it’s like 3 or 4 custom DSLs).
                      • It takes many custom tools to generate the intermediate format from source. The tools aren’t really public; I don’t distribute them.
                      • the intermediate format is architecture-independent. Same reason that people distribute source tarballs. Binaries are a separate step.
                      • C++ has great debugging support, great profiling support, and more. These tools understand the location of statements and the structure of functions / methods.
                      • a C++ compiler is a huge thing that does a huge amount of work for you. It’s kind of a big black box that’s deployed everywhere, and you can twiddle its knobs and get great results. Of course that requires understanding a bunch of C++, but that work pays dividends.
                      • for the Souffle datalog case, they make unique use of templates. Similarly there are some pretty interesting uses of templates in Eigen, a linear algebra library. It’s a tool you can use to express things that are not easily expressed in other ways, i.e. generic algorithms and specialized versions for performance. You don’t get that by writing raw assembly, or by generating LLVM IR (which is not a stable format).
                        • Oil doesn’t really use this; it only makes basic use of templates. But I just point it out as something you’ll be hard-pressed to get any other way.

                      It sounds like you are making a very abstract argument, not one based around engineering…

                  1. 8

                    EDN deserves a mention https://github.com/edn-format/edn. It’s pretty much only used by Clojure and adjacent projects (e.g. Datomic), but there are serialization libraries for a bunch of languages (https://github.com/edn-format/edn/wiki/Implementations)

                    1. 2

                      Author here. I just learned about EDN recently, it’s listed under “S-Expressions”. Definitely worth considering and I like what I’ve seen of it, it just needs to be… I dunno, robustified. Turned into an actual reference document rather than a description, ideally with accompanying test suite.

                      1. 2

                        Aha, I didn’t see that under S-expressions. Technically I don’t think EDN is actually an S-expression, because it’s not just lists and atoms.

                        1. 1

                          I like what I see in EDN, but I also wonder if it’s more of a language-specific serialization format. Like Go has the “Gob” format. Python has Pickle.

                          Those are tightly coupled to the language itself. In pickle’s case there are several different versions of it.

                          EDN looks like it’s about halfway in between… I like the extensibility mechanism with tags, but until it’s implemented in another language, it feels like a Clojure-specific format to me.

                          IMO this is the main compromise with serialization formats… You can trade off convenience in one language for interoperability in all languages. JSON is kind of impoverished but that means it works equally well in most languages :)

                          The thing that annoys me about both JSON and EDN is that they inherit the browser and JVM’s reliance on 2 byte unicode encodings. You have to use surrogate pairs instead of something like \u{123456}. And JSON can’t represent binary data.

                          1. 5

                            Well it’s implemented in JavaScript: https://github.com/shaunxcode/jsedn

                            Go: https://github.com/go-edn/edn

                            Rust: https://github.com/utkarshkukreti/edn.rs

                            .NET: https://github.com/robertluo/Edn.Net

                            Java: https://github.com/bpsm/edn-java

                            More implementations here: https://github.com/edn-format/edn/wiki/Implementations

                            Still would be nice to have a spec and test suite though.

                      1. 1

                        Sounds like the same reason that out parameters and mutation are idiomatic in Lisp, i.e. for “accumulators”.

                        I make use of that pattern all over Oil, e.g. with recursive evaluation functions all appending to the same Python list.

                        I started with return values, but the accumulator is a lot more elegant IMO (and efficient due to not creating lots of lists)

                        1. 3

                          I think this is touched on in the article but not quite spelled out, but my take on the reason to use an out parameter has a lot to do with allocation. If your function would otherwise need to allocate heap memory in order to produce a result, an out parameter can make it so the responsibility of allocation (and memory ownership) is delegated to the caller. That means the caller is allowed to reuse the same memory or use any other custom allocation strategy.

                        1. 32

                          I know these are corporations without feelings etc. etc. but I can’t help but feel bad for Docker.

                          The “it’s just BSD jails/chroot” argument is like the “Dropbox is just ftp” arguments - Docker made containerization mainstream.

                          And after Docker created a pretty massive movement to start using and deploying containers, everybody and their mother started writing their own container runtimes. Runc, OCI, CoreOS, cr-io, rkt, podman. Where were these tools before Docker?

                          1. 45

                            I see where you’re coming from, but from someone who used containers pre-Docker, and on container tech pre-Docker for a hobby project (2011-2015), I think they did a bad job.

                            Docker is a pretty low quality piece of software. The superfluous daemon is a major reason (the article mentions this). The security story is another, though they were building on unstable foundations in the kernel.

                            So yes they made something popular, and created a de-facto standard. But other people have had to swim in their wake and clean up the mess (OCI, etc.).

                            I like cyphar’s blog on these topics: https://www.cyphar.com/blog/

                            They also raised a lot of money and hired a lot of people, which I suppose is a good way to build certain things. I’m not sure it is a great way to build container tech, although I’ll take your point that there was little cooperation in the space beforehand.

                            There is some “blame” to Google, since the kernel features were contributed by them, but user space tools were “missing”:

                            https://en.wikipedia.org/wiki/Cgroups

                            But really I think it is more of an issue with the kernel dev model, which is good at copying APIs that AT&T or someone else made, and bad at developing new ones:

                            https://lwn.net/Articles/679786/

                            As maintainer Tejun Heo himself has admitted [YouTube], “design followed implementation”, “different decisions were taken for different controllers”, and “sometimes too much flexibility causes a hindrance”. In an LWN article from 2012, it was said that “control groups are one of those features that kernel developers love to hate.”

                            https://lwn.net/Articles/484251/

                            1. 14

                              Docker is a pretty low quality piece of software. The superfluous daemon is a major reason (the article mentions this). The security story is another, though they were building on unstable foundations in the kernel.

                              Even taken on its own terms (daemon architecture, etc.) Docker is pretty low-quality. The daemon tends to wedge itself in an unresponsive state after running for a few weeks!

                              “design followed implementation”

                              I love this quote, and I think this is a factor in Docker as well. The image layer system pretty directly exposes union filesystems, but it’s not a particularly efficient solution to the problem of distributing incremental image updates. I think that this is part of why part of the Docker community pushes for very small base images, like Alpine — it’s a workaround for an inefficient distribution mechanism.

                              1. 5

                                Yes, yes and thrice yes. What I also really don’t understand is why there can’t be a big push for build tooling that takes big chunks of work out of the process of building statically linked binary only images so that you don’t need a whole OS inside the container. I mean, I guess I do, because there’s clearly no motivation for any of the big companies, who make tons of money out of hosting per-mb container registry storage, to make containers smaller, let alone to expend effort helping others to do so - and the whole “resources are cheap, developer time is expensive” line that Rails & DHH popularised back in the day makes everyone think “ahh it’s half a gig, who cares? Incremental, anyway, amirite?”. Well, I care. Every single byte of every image contributes to energy usage. It makes me cross.

                                1. 12

                                  That’s basically bazel, open sourced by Google: https://bazel.build/

                                  Google doesn’t ship OS images inside its containers, in the style of Docker. (And remember as mentioned above, many of the Linux kernel container features were developed by Google).

                                  Instead they use statically linked binaries. However it doesn’t really solve the “gigabyte images” problem. A single static dependency graph tends to blow up as well, and you end up with gigabyte binaries.

                                  Bazel works really well for some cases, namely if most of your code is in C++. It compiles fast and the binaries are reasonably small compared to the optimum.

                                  In other cases it can be the worst of both worlds, because you have to throw out the entire build system (Makefile, autoconf) and rewrite in the Bazel build language. You have a maintenance problem, in other words.


                                  I used to have the same question as you… but then I tried to build containers from scratch, which was one of the primary motivations for Oil.

                                  I guess the short answer is “Do Linux From Scratch and see how much work it is”. The dependency graph is inherently not well curated.

                                  Another answer is “try to build a container that makes a plot in the R language, built from scratch”. This is a very difficult problem, more difficult than anyone thinks without having tried it. (e.g. it depends on Fortran compilers, graphics libraries, etc.)


                                  I think there could be a big push along the lines of “reproducible builds” for “dependency graph curation”, but it is a hard problem, not many people have the expertise, and requires a lot of coordination … It basically requires fiddling with a lot of gross build systems, i.e. adding more hacks upon the big pile that’s already there.


                                  Another thing I learned ~2012 when looking into this stuff: Version resolution is an NP-complete problem.

                                  https://research.swtch.com/version-sat

                                  Also Debian’s package manager is not divorced from its package data. There are hacks in the package manager for specific packages. That is true for most distros as far as I know.

                                  So again, it’s a very hard problem … not just one of motivation.

                                  1. 2

                                    That’s a really insightful and interesting reply. Thank you.

                              2. 2

                                So it could be safer to say that their marketed/attractive product, even if not technically the best, may have galvanized cooperation and better developments in the container space?

                                1. 3

                                  I would say that’s accurate. If raising a bunch of money for a non-commercially-viable company is the only way to do that, then I guess I don’t have any answers … :) But I sure wish there was a better way.

                                  The old way was that AT&T was a monopoly and hired smart people to design software, which was flawed in its own way too (and Xerox PARC too). Google has a similar role now, in that they open source enormous amounts of code like Android, Chrome, and cgroups, which other people build on, including Docker … But yes it is ironic that Docker motivated Google to work on container tech, when Google had the original kernel use cases for their clusters.

                                2. 2

                                  But really I think it is more of an issue with the kernel dev model, which is good at copying APIs that AT&T or someone else made, and bad at developing new ones:

                                  FreeBSD Jails and Solaris Zones are both better kernel technologies for deploying containers than the cgroup / namespace / seccomp-bpf mess that Linux uses, so it appears that the Linux kernel devs are not actually very good at copying APIs that other people made either. When Solaris copied Jails, they came up with something noticeably better (Jails have now more or less caught up). Linux, with both Jails and Zones to copy from, made something significantly worse in the name of generality and has no excuse.

                                3. 28

                                  Where were these tools before Docker?

                                  Lacking a marketing department.

                                  1. 32

                                    Like it or not, marketing is a part of software development. The eschew-everything hacker ethos is marketing too, it just has a different target audience.

                                    1. 1

                                      Well, I despise the former version of marketing that you mention. Software should stand on its own merits only, not because it’s “first” or whatever.

                                      1. 1

                                        The “social” in “social coding” seems to be eating the “coding” more than usual as time progresses.

                                        I suppose there are some that believe that to be a feature.

                                        1. 3

                                          This is a subjective judgment, which I think reflects more on the speaker and their biases than on any objective condition. I don’t share it, for the record — I think programming has been a pathologically asocial discipline for most of it’s existence, and we’re just now beginning to “course correct” in a meaningful way.

                                      2. 8

                                        LXC did have marketing, though admittedly not the sort of blitz Docker went on.

                                        I think that the Docker difference is willingness to sacrifice security for UX. With LXC you must type sudo all the time. The Docker way is to add yourself to the docker group. Sure, that makes your user root-equivalent, but now all your docker commands work so easily! This produces good word-of-mouth and lots of Medium tutorials, none of which mention the root-equivalence thing.

                                        It is probably also important that the Docker CLI runs on a Mac. In that context we might do better to compare Docker to Vagrant than LXC. Even if boot2docker/Docker Desktop/Docker for Mac don’t work that well, they are initially appealing.

                                        1. 9

                                          I think by far the biggest problem Docker solved is easy redistribution of images: docker push, docker pull. This is the part that no container system had at the time (AFAIK) and explains much of its popularity.

                                          Everything else: meh.

                                          1. 2

                                            This. I saw one of the initial Docker demos, at PyCon long ago, and this was what made a compelling difference from what LXC provided.

                                        2. 2

                                          Pithy and cute, but none of the tools OP mentioned existed before Docker.

                                      1. 4

                                        I’m also curious if @andyc has any thoughts on this as a shell developer.

                                        1. 3

                                          Yes thanks, I generally don’t believe in “POSIX minimalism” for the reasons you quoted, e.g.

                                          https://lobste.rs/s/1tcqoo/introduction_posix_shell#c_jebkjl

                                          Looking at the patch

                                          https://github.com/kisslinux/community/blob/d99595e575a471698032dbad0cd2c9977d6b0c79/community/go/patches/posix-build.patch

                                          As far as I can tell this is one of the “short scripts” which could reasonably be POSIX.

                                          But I understand that the maintainers have better things to do with their time, and there is no way to prevent regression.

                                          That is what I meant in my FAQ that POSIX is an “untestable” concept. The real goal is portability, and that means people should be running on multiple shells. If lots of people are doing that it makes sense to accept the patch.

                                          But again OpenBSD ksh accepts A LOT OF STUFF that’s not POSIX. So I think that for example, the intersection of bash + ksh makes a lot more sense as a portability goal. (Flip side: I suppose that OpenBSD has been running bash for decades, so there’s really no problem here to justify the portability.)

                                          Related comment: POSIX shell misses a lot of portable constructs


                                          For bigger scripts, my advice is here:

                                          https://news.ycombinator.com/item?id=24523162

                                          If you want a portable shell script, in many cases my (biased) advice would be to make your script work on both bash and Oil. (Obviously there are short shell scripts which you may want to run on BSD, etc. This is more about big scripts, which POSIX falls down for.)

                                          Oil already runs some of the biggest shell scripts in the world, many of them unmodified. Moreoever, when there’s a patch necessary to run it, it often IMPROVES the program.

                                          http://www.oilshell.org/blog/2020/06/release-0.8.pre6.html#patch-to-run-the-mal-lisp

                                          You’ll be less tied to the vagaries of bash.

                                          If anyone’s script doesn’t run under Oil, I’m interested. See https://github.com/oilshell/oil/wiki/What-Is-Expected-to-Run-Under-OSH

                                          1. 2

                                            there is no way to prevent regression.

                                            This is kind of bullshit. They already have CI scripts, just run it with more than one shell to check it still works. Run it with busybox, run it with dash, run it with ksh on the openbsd builds.

                                        1. 3

                                          Speaking of lexer modes (or stateful lexers), I took some inspiration from Pygments/Chroma (and also Oil) and added support for stateful lexing to Participle, my Go parser library.

                                          This allows lexing of distinct states, even recursively, for example in this string interpolation parser.

                                          1. 2

                                            Hm interesting. I made a list of what other parser generators do here:

                                            http://www.oilshell.org/blog/2020/07/ideas-questions.html#ad-hoc-context-in-common-lexing-tools-recursion

                                            I’d be interested some analysis / comparison of all these different mechanisms!

                                            1. 3

                                              Oh interesting, I hadn’t seen that post. I would definitely suggest adding Pygments lexers to that list. They are effectively state machines where each state is a distinct lexer, and so very much aligned with the idea of modal lexers.

                                              I must admit that even though I wrote Chroma in 2017 (based on Pygments) I’ve had an issue open in Participle since 2018 to figure out how to support stateful lexing. It wasn’t until I read your article on modal lexers that it dawned on me that I could use the same approach from Chroma/Pygments for building general purpose lexers in Participle. Great set of articles Andy, thanks.

                                              1. 2

                                                I’d be interested some analysis / comparison of all these different mechanisms!

                                                Yes, that would be a really useful resource. Also missing, I’ve found, is best practice on where to draw the line between lexing and parsing, as it can often be quite blurry.

                                            1. 4

                                              The section following this quote is good. The IFS=/ vulnerability for setuid programs is a good one.

                                              The most important principle in rc’s design is that it’s not a macro processor. Input is never scanned more than once by the lexical and syntactic analysis code (except, of course, by the eval command, whose raison d’être is to break the rule)

                                              Oil follows the same principle, which I call static parsing or undecidable parsing – versus the dynamic parsing of Bourne shell and all its derivatives like bash.

                                              (A notable difference is that Oil compatible with POSIX, Bourne shell, ksh, and bash, while rc is not.)

                                              I probably got that idea from rc shell without realizing it, as I remember reading this paper more than 10 years ago.


                                              I also recently picked up Programming Perl (by Christiansen, foy, Larry Wall), and it points out the same issue with shell multiple times:

                                              In Chapter 20 on security:

                                              Perl is easy to program securely because it’s straightforward and self-contained. Unlike most shell programming languages, which are based on multiple, mysterious substitution passes on each line of the script, Perl uses a more conventional evaluation scheme with fewer hidden snags.

                                              Although this is not entirely true, because there are corners of Perl that have undecidable parsing.

                                              I heard Larry Wall say that one of the goals of Perl 6 was to really fix this problem. It doesn’t do dynamic parsing like Perl 5 does.


                                              Also, back in January 2019, I rediscovered a security problem due to dynamic parsing which appears in all Bourne-derived shells (and at least the OpenBSD shell actually patched it, not sure about bash):

                                              http://www.oilshell.org/blog/2019/01/18.html#a-story-about-a-30-year-old-security-problem

                                              https://github.com/oilshell/blog-code/tree/master/crazy-old-bug

                                              The guy who discovered ShellShock in 2014 wrote a few of the StackOverflow answers there.


                                              plug: tell me what you think of Oil’s syntax :)

                                              1. 5

                                                In the “Parens” section:

                                                var p = / digit+' ('seconds' | 'minutes' | 'hours' ) /

                                                There’s an odd number of single-quotes there, so I’m not sure which spans are supposed to be quoted and which aren’t.

                                                Are the syntaxes labelled “not implemented” not implemented yet, or are they deliberately avoided for some reason?

                                                Since if word { must treat word as a command for bash-compatibility reasons, and since square-brackets seem to indicate “expression context” in at least a few places, perhaps if [x < 0] { should be the syntax for expression conditionals rather than if (x < 0) {?

                                                In the “Language Influences” document, the section about Go’s argument parsing says:

                                                mybuiltin --show=0 # turn a flag that's default true

                                                Turn it how? Like the Turn Undead spell in D&D?

                                                1. 3

                                                  Great feedback, thanks!

                                                  I fixed the typos with the first ' and the Go thing. I changed the wording to “not implemented yet”.

                                                  I realized there is really no consistency between parens and parens+sigil, and brackets and brackets+sigil, mostly due to legacy constraints. So I re-organized the doc along those lines.

                                                  It’s technically easier to take over subshell shopt -s parse_paren, for if (x > 0) ..., and I think it just looks more familiar. if [x > 0] would be needlessly different.

                                                  Let me know if you see anything else!

                                                1. 2

                                                  I made a bunch of changes to the syntax and wrote this doc rationalize it. Feedback is welcome!

                                                  This draft may also be interesting: http://www.oilshell.org/preview/doc/language-influences.html

                                                  1. 2

                                                    Surprised to see all the love for FastCGI. My recollectoin is that it was a nightmare to use – very fussy (hard to program for and integrate with), and quite brittle (needing regular sysad intervention).

                                                    1. 2

                                                      I remember trying to set it up once on the server side (~10 years ago?) and it was not fun.

                                                      However as a user on shared hosting, it works great. I’ve been running the same FastCGI script for years, and it’s fast, with no problems. So someone figured out how to set it up better than me (which is not surprising).

                                                      I think the core idea is good, but for awhile the implementations were spotty and it was not well documented in general. There seems to be significant confusion about it to this day, even on this thread of domain experts.

                                                      To me the value is to provide the PHP deployment model and concurrency model (stateless/shared nothing/but with caching), but with any language.

                                                      1. 1

                                                        We ran FastCGI at quite large scale back around 2000 and it was very reliable and not particularly difficult to work with.

                                                        1. 1

                                                          I was using it at mid-scale in the aughts (mod_fastcgi on apache) and it was not a pleasant experience. Maybe our sysads were particularly bad, or maybe our devs just didn’t get the concepts, but I recall others in my local user groups having similar difficulties.

                                                      1. 5

                                                        I’m still using FastCGI! It works well on Dreamhost.

                                                        The Python support is not good! In theory you just write a WSGI app, and it will work under a FastCGI wrapper.

                                                        But I had to revive the old “flup” wrapper, since Dreamhost has Python 2. I downloaded an older tarball and build it myself.

                                                        Use case: I parse thousands of shell scripts on every release and upload the results as a “.wwz” file, which is just a zip file served by a FastCGI script.

                                                        https://www.oilshell.org/release/0.8.1/test/wild.wwz/

                                                        So whenever there’s a URL with .wwz in it, you’re hitting a FastCGI script!

                                                        This technique makes backing up a website a lot easier, as you can sync a single 50 MB zip file, rather than 10,000 tiny files, which takes forever to stat() the file system metadata.

                                                        It’s more rsync-friendly, in other words.

                                                        I also use it for logs in my continuous build: http://travis-ci.oilshell.org/jobs/


                                                        Does anyone know of any other web hosts that support FastCGI well? I like having my site portable and host independent. I think FastCGI is a good open standard for dynamic content, and it works well on shared hosting (which has a lot of the benefits of the cloud, and not many of the downsides).

                                                        (copy of HN comment https://news.ycombinator.com/item?id=24684563)

                                                        1. 1

                                                          Your WWZ system is interesting and I’d love to learn more about it. I remember taking down a note to dig more into it or try to build something like it for my own usage. If I recall correctly in something I read, you created it primarily to take advantage of excellent compression of a large number of text files while being able to manage a single file, and reads from that file are cached. Is that accurate?

                                                          1. 1

                                                            So you generate zip files, store them on disk, then add a FastCGI script to get the zips?

                                                            1. 1

                                                              Yes exactly! Just drop the .zip files in a dir, renamed to .wwz, and they get served!

                                                              This is a tiny program that I meant to share, but never got around to it… I find it very useful, when you want to serve thousands of tiny files. It’s in Python 2 since Dreamhost is running Debian with Python 2.

                                                              1. 1

                                                                It’s always easier to manage a single archive compared to thousands of smaller files. Zip is really an old compression algo and file format, but it has indexes so it allows faster random access. And easier to manage since most os has builtin support. Dunno if there is any better alternative of zip providing both random access and decent compression ratio. Maybe leveldb?

                                                                1. 1

                                                                  Zip is really an old compression algo and file format

                                                                  Zip has a few options for compression algorithm but for uses like this it’s very common to use the no-compression option. This has the advantage that you can mmap (or equivalent) the entire file and use the index to get pointers to the data. On a 64-bit system, the FastCGI process can probably mmap all of the files you’ll want and rely on the OS evicting the pages when it is short on RAM. Dovecot uses this strategy (though not with zip files) very effectively, mmaping all of the index files and relying on the OS to keep its physical memory consumption under control (any of the mapped pages can be evicted almost for free and then read back in as needed).

                                                                  1. 1

                                                                    Android does this with asset files inside APKs (which are zips) and the build process includes a utility for adjusting zip files so that every uncompressed file inside is aligned to some alignment (iirc 2 bytes) to support getting assets contents by mmap()ing the APK. :)

                                                                2. 1

                                                                  Please do share, I’d like to learn more about it.

                                                                  I’m considering adding fastcgi to my project, and the zip portion is also interesting.

                                                            1. 4

                                                              For the non-Pythonistas here you might be interested in looking at WSGI and ASGI, two protocols similar in spirit to FastCGI, but with a tighter coupling to the host language. I find it interesting that WSGI managed to keep up and stay relevant, paving the road for ASGI, which supports WebSocket as well as HTTP/2.

                                                              1. 8

                                                                WSGI and ASGI aren’t alternatives to FastCGI.

                                                                • WSGI is a Python protocol, i.e an “API” that gives you a Python dictionary representing the request. You write the response back to a Python file-like object in the dictionary.
                                                                • CGI and FastCGI are Unix protocols.
                                                                  • CGI starts a process with a given env, and you write the response to stdout. You can write a CGI script in any language (Perl was once the favored language). You can use WSGI or not. Perl now has something analogous called PSGI I think.
                                                                  • FastCGI uses a persistent process that sends the env dictionary over a socket (in some weird binary format).

                                                                The way I use FastCGI is to write a WSGI app (which can be done using any Python framework; I use my own framework).

                                                                And then I use the “flup” wrapper to create a FastCGI binary.

                                                                The project is kind of “hidden” now, but it works: https://pypi.org/project/flup/ and https://pypi.org/project/flup-py3/

                                                                https://www.saddi.com/software/flup/

                                                                https://www.geoffreybrown.com/blog/python-flup-and-fastcgi/

                                                                1. 3

                                                                  Does anyone here have experience with PHP deployment? I’m curious if FastCGI (FPM) is the preferred “gateway solution” for PHP? vs. mod_php which is a shared library dynamically linked with Apache.

                                                                  Some of these links seem to suggest that this is true? You get better performance with FastCGI? That is a little surprising.

                                                                  Either way it seems like FastCGI is relatively popular with PHP, but sorta unknown in other languages? I never heard of anyone running Django or Rails with FastCGI? I think those frameworks are designed to run their own servers, and don’t play well with FastCGI, even if they can technically make a WSGI app in Django’s case.

                                                                  https://serverfault.com/questions/645755/differences-and-dis-advanages-between-fast-cgi-cgi-mod-php-suphp-php-fpm

                                                                  https://blog.layershift.com/which-php-mode-apache-vs-cgi-vs-fastcgi/

                                                                  https://stackoverflow.com/questions/3953793/mod-php-vs-cgi-vs-fast-cgi

                                                                  1. 6

                                                                    Yes! I’m using that already for many years on CentOS/Fedora. See https://developers.redhat.com/blog/2017/10/25/php-configuration-tips/ for more information from Red Hat.

                                                                    I also wrote blog posts for CentOS and Debian 10 on how I use php-fpm in production.

                                                                    1. 1

                                                                      Cool.. Is it correct to say that PHP-FPM is a C program that embeds the PHP interpreter and makes .php scripts into FastCGI apps? I’m just curious how it works.

                                                                      I think Python never developed an analogous thing, which is a shame because then there would be more shared Python hosts like there are shared PHP hosts. The closest thing is “flup”, which is not well documented (or maintained, at least at some points)

                                                                    2. 6

                                                                      mod_php still has some usage, and is still maintained, but IMO yes PHP-FPM (essentially a long lived process manager for PHP) accessed via FastCGI from a regular http server (normally apache or nginx, recently HAProxy also added support for fastcgi) is the “best” solution for now.

                                                                      mod_php will probably have a slight latency benefit, but it means Apache will uses more memory and is limited to the pre-fork worker, plus you lose a lot of flexibility (e.g. going the factcgi/fpm route you can have multiple versions of php installed side by side, you can have multiple completely different FPM instances, etc).

                                                                      1. 1

                                                                        I don’t know if this is fixed, but mod_php used to run the PHP scripts in the same process as the web server. In a shared hosting environment, this meant that any file readable by one user was readable by scripts run by the others (for example, if you put your database password in your PHP file, someone else could write a PHP file that would read that file and show it to the user, then compromise your database). It also meant that a vulnerability in the PHP interpreter could be exploited by one user to completely take control of the web server. The big advantage of FastCGI for multi-tenant systems was the ability to run a copy of the PHP interpreter for each user, as that user.

                                                                        1. 1

                                                                          I don’t think “fixed” is the right term there, but regardless that is the inherent nature of mod_php, yes.

                                                                          There was (/is, via a fork) a variant called mod_suphp that uses a setuid helper, so the process runs as the owner of the php file it’s executing.

                                                                        2. 1

                                                                          Cool thanks… I asked the same question in this sibling.

                                                                          https://lobste.rs/s/xl63ah/fastcgi_forgotten_treasure#c_6u4wq3

                                                                          Basically I want to make an “Oil-FPM” :) I think I can do that with

                                                                          https://kristaps.bsd.lv/kcgi/

                                                                          that wraps the Oil interpreter? And I probably need some more process management too?

                                                                          There is no Python-FPM as far as I know, and that is a shame.

                                                                          I want to preserve the deployment model of PHP – rsync a bunch of .PHP files. Likewise you should be able to rsync a bunch of Oil files and make a simple and fast script :)

                                                                          Similar to what I have here if it was dynamic rather than static: http://travis-ci.oilshell.org/jobs/ That could easily be written in Oil.


                                                                          Found the source. Woah is it true this hasn’t had a release since 2009 ???

                                                                          https://launchpad.net/php-fpm

                                                                          https://code.launchpad.net/php-fpm

                                                                          https://github.com/dreamcat4/php-fpm

                                                                          Or maybe it’s built into PHP now?

                                                                          Ah yes looks like it is in there as of 2011, interesting: https://www.php.net/archive/2011.php#id2011-11-29-1

                                                                          But the old source is useful. It’s about 8K lines of C and handles processes and signals! Doesn’t look too bad. If anyone wants to help integrate it into Oil let me know :)

                                                                        3. 2

                                                                          First of all, I’ve been out of the loop for a few years, but from ~2010-2017 Apache was falling out of favor anyway, so mod_php was out of the question if you used nginx or lighttpd. I think 2.4 brought some renewed interest in Apache, but I have no facts to back that up.

                                                                          1. 1

                                                                            Right, that makes sense. I think Nginx seems to encourage their own uwsgi and it doesn’t have FastCGI support?

                                                                            The downside is that I’ve never seen a shared host that lets you “drop in the uwsgi file” like you just “drop in a .php file” or in Python’s case “drop in WSGI app wrapped by flup” ?

                                                                            Basically Nginx doesn’t seem to support shared hosting as well as Apache? I’d be interested to hear otherwise. Dreamhost still uses Apache and the setup is pretty nice.


                                                                            EDIT: Someone e-mailed me to clarify that uwsgi is a program that supports the FastCGI protocol in addition to the uwsgi protocol :)

                                                                            1. 1

                                                                              No idea, I haven’t used a shared host in many years.

                                                                              But most of the web servers would indeed support an arbitrary FastCGI interface and if you’re allowed to run a binary you could have everything behind that webserver, just that I’ve never seen non-dynamic languages do that, Rust and Go mostly offer a webserver on their own and you just reverse-proxy through.

                                                                          2. 2

                                                                            From what I’ve seen mod_php is not preferred anymore. Apache is a pretty amazing swiss army knife, but it kind of has to do too much in one binary. The trend has been to use a pretty thin L7 proxy like nginx and/or haproxy to route to services.

                                                                            I don’t know why PHP itself uses fastcgi rather than a native http implementation. Maybe it’s faster to parse? Maybe there’s better side-channels for things like remote IP when proxying?

                                                                            A side note: I think apache was unfairly maligned and a victim of bad defaults. IIRC debian shipped it in multi-processs mode with 10 workers, but apache has pluggable mpms (multi-processing modules) so you can configure it to be epoll/thread based like nginx and be a decent file server or proxy. Unfortunately not all modules are compatible with every mpm.

                                                                            1. 3

                                                                              The main difference between a FastCGI backend and an HTTP backend to me is sort of accidental – in the FastCGI world, the process is known to be ephemeral like CGI, but the server keeps it alive between requests as an optimization.

                                                                              If you lose your state, well no big deal – it was supposed to be like a CGI script.

                                                                              But that is not true of all HTTP servers.

                                                                              I think this matters in practice as on Dreamhost I get new FastCGI processes started every minute or 10 minutes. That is not customary for HTTP servers! (They also start 2 at a time).

                                                                              Also I think FastCGI processes are safely killed with signals.


                                                                              So it is true that FastCGI has a weird and somewhat unnecessary binary protocol. But it is also includes the “process” part which is useful.

                                                                              1. 2

                                                                                Debian shipped Apache with Apache‘s default, which is the worker MPM for 2.2 and the event MPM for 2.4.

                                                                                But mod_php switched you to the prefork MPM because some PHP extensions are not threadsafe.

                                                                              2. 1

                                                                                I never heard of anyone running Django or Rails with FastCGI?

                                                                                Way back in the earliest days, Django’s first packaged release (0.90) shipped handlers for running under mod_python, or as a WSGI application under any WSGI-compatible server, but recommended mod_python. Django 0.95 added a document explaining how to run Django behind FastCGI, and a helper module using flup as the FastCGI-to-WSGI bridge.

                                                                                The mod_python handler was removed after Django 1.4 (so 1.5 was the first version without it). The flup/FastCGI support was removed after Django 1.8. Since then, Django has only supported running as a WSGI application.

                                                                                I can’t speak to anyone else, but I for one have run Django in production under each of those options: mod_python, FastCGI, and pure WSGI.

                                                                              3. 1

                                                                                I guess the similarity is that WSGI passes data in a similar way: the dict acts as the store for environment variables and the stdin/stdout are passed as explicit values. The takeaway for me is that a simple Unix-style design can survive many years of battle testing and stay relevant!

                                                                              4. 3

                                                                                For context and accuracy, WSGI was not a new idea nor the only ubiquous one around in its problem space. In the WSGI proposal, Guido van Rossum clearly states that the idea was to build something modeled after Java’s Servlet API. Java was the most used programming language at the time. The servlet API is still widely used.

                                                                                1. 2

                                                                                  The uWSGI documentation also has support for similar integrations with other languages (perl, ruby): https://uwsgi-docs.readthedocs.io/en/latest/PSGIquickstart.html

                                                                                1. 10

                                                                                  Now I’m learning more about fastcgi, I think we replaced it well when we as a collective moved towards http servers in everything.

                                                                                  FastCGI was a way to run a web server except speaking a client protocol almost nobody uses. It was a necessary step between the glory of CGI (short lived processing handling a single request in a defined way) and http servers everywhere. Once we could run our code as servers, why not allow us to use our browsers alone when debugging?

                                                                                  1. 14

                                                                                    FastCGI is lighter and less complicated, because it isn’t HTTP and it can ignore a lot of the complexities of talking to random real-world HTTP clients with 30 years of back-compatibility and edge cases.

                                                                                    In the real world you’re going to put some kind of web server / load balancer / application proxy between your app and the outside world anyway, and it has to deal with that stuff, so why not let it? And once it’s done the work of parsing the request, why not just send it straight to the app in a nice concise binary format, instead of going to the effort of re-formatting it as HTTP, adding some headers that say “hey, I know it looks like this request is coming from me, but actually I’m just the middleman, and really it came from over there, and by the way my external name and port is such and such, and the request was/wasn’t secure”, leaving the app to parse the request again and re-form, in an error-prone manner, the state that the server already has, and that the app needs to operate properly? A FastCGI app has the same “already on the inside” perspective as a CGI app, it gets the real client info and the real server info in environment variables, no munging required, and since they don’t come in as headers, you have fewer worries about some malicious client smuggling a value in.

                                                                                    tl;dr “my app is its own HTTP server” is an asset in dev and a liability in prod.

                                                                                    1. 1

                                                                                      tl;dr “my app is its own HTTP server” is an asset in dev and a liability in prod.

                                                                                      Perhaps it makes sense for prod to need more setup than dev. Including perhaps figuring out headers and ensuring all is ok on the request. Are you aware of mainstream mechanisms for just running fastcgi processes for dev?

                                                                                      Release engineering is real, and it doesn’t really make sense to minimize it for the sake of not having an independent http parser.

                                                                                      1. 1

                                                                                        All my stuff for the last umpteen years uses some kind of psgi/wsgi/asgi/you-get-the-idea compatible framework, so it’s equally capable of running under FastCGI or a standalone HTTP server for dev. But also, it only takes like a dozen lines of config to make nginx or lighttpd forward to a FastCGI app (plus serve your static files if you want), and for a more “organized” dev setup with a docker-compose or something that’s what I’d do… that way dev can replicate prod a little more closely.

                                                                                    2. 2

                                                                                      Also see this response: https://lobste.rs/s/xl63ah/fastcgi_forgotten_treasure#c_kaajpp

                                                                                      summary: FastCGI includes process management that is useful. The binary protocol part is perhaps superfluous

                                                                                      1. 1

                                                                                        Fantastic reason.

                                                                                      2. 1

                                                                                        FastCGI’s benefit is allowing de-coupling of the web server from the rest of the application, instead of building another monolith.

                                                                                        This is why it remains relevant.

                                                                                        1. 1

                                                                                          It’s rather tightly coupled, don’t you think?

                                                                                          1. 1

                                                                                            To what?

                                                                                            Certainly not to a particular web server…

                                                                                            1. 1

                                                                                              To having a web server as part of the monolith

                                                                                      1. 3

                                                                                        Failed almost immediately on Mint:

                                                                                        $ build/dev.sh minimal
                                                                                        ...
                                                                                        build/setup.py -> libc.so
                                                                                        native/libc.c:22:10: fatal error: Python.h: No such file or directory
                                                                                           22 | #include <Python.h>
                                                                                              |          ^~~~~~~~~~
                                                                                        compilation terminated.
                                                                                        error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
                                                                                        

                                                                                        Installed python2-dev package. Failed next with:

                                                                                        build/setup_line_input.py -> line_input.so
                                                                                        native/line_input.c:64:10: fatal error: readline/readline.h: No such file or directory
                                                                                           64 | #include <readline/readline.h>
                                                                                              |          ^~~~~~~~~~~~~~~~~~~~~
                                                                                        compilation terminated.
                                                                                        error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
                                                                                        

                                                                                        Installed libreadline-dev. Next failure:

                                                                                        $ build/dev.sh minimal
                                                                                        Removing _devbuild/gen/*
                                                                                        ...
                                                                                        build/setup.py -> libc.so
                                                                                        ...
                                                                                        Ran 13 tests in 0.001s
                                                                                        OK
                                                                                        build/setup_line_input.py -> line_input.so
                                                                                        Ran 1 test in 0.000s
                                                                                        OK
                                                                                        build/setup_posix.py -> posix_.so
                                                                                        Ran 8 tests in 0.000s
                                                                                        OK
                                                                                        py-yajl/setup.py -> yajl.so
                                                                                        python2: can't open file 'setup.py': [Errno 2] No such file or directory
                                                                                        

                                                                                        Stopped.

                                                                                        1. 3

                                                                                          Thanks for trying it and for the report!

                                                                                          Yeah I guess the “quick start” isn’t really that great if you don’t have those 2 packages. The longer Contributing page has the ubuntu-deps shell function, but it’s unclear if everyone can use that. I think it would work on Mint because it’s ubuntu derived ?

                                                                                          https://github.com/oilshell/oil/wiki/Contributing

                                                                                          I’m not sure what the next error is. Maybe py-yajl/setup.py doesn’t exist because the submodule isn’t there? The

                                                                                          git submodule update –init –recursive

                                                                                          command should do that.

                                                                                          In any case, this is useful, I will think about how to smooth over the process … Ironically Oil is running into the “shell script portability” problem, which “Linux Standard Base” was trying to fix (but failed)! I think we need something like that …

                                                                                          1. 1

                                                                                            My first try (above) was using your original instructions given here, but without the git clone (just downloading the git .zip file). My next try below was using the instructions at your Contributing page. That fared better, but there were issues:

                                                                                            1. I noticed that the packages python-dev-is-python2 and python-is-python2 were required by build/dev.sh ubuntu-deps. This sounds like a no-no (everything should be explicitly labelled as either python2 or python3).
                                                                                            2. The build seemed to go OK (with warnings) and bin/osh ran without error, but many of the tests are failing.
                                                                                            1. 1

                                                                                              Yeah unfortunately both I and our Travis build are on Ubuntu 16.04 (default for Travis), and it doesn’t have a python2-dev package. Looks like that came about in 20.04 :

                                                                                              https://packages.ubuntu.com/focal/python2-dev

                                                                                              I have a TODO to upgrade the Ubuntu version …

                                                                                              https://github.com/oilshell/oil/issues/833

                                                                                              I just fixed a couple shebang lines that aren’t explicitly python2, and there is now a lint check for it in the continuous build.

                                                                                        1. 1

                                                                                          If you liked this post, do me a favor and tell me if you can run the dev build of Oil, which I believe takes less than 2 minutes, although people appear to have problems with it:

                                                                                          https://github.com/oilshell/oil/blob/master/README.md

                                                                                          It’s basically:

                                                                                          1. A Linux machine – Debian/Ubuntu-ish is most likely to work – bash, coreutils, etc. (Other distros should also work, but Debian/Ubuntu are frequently tested)
                                                                                          2. git clone (with a submodule)
                                                                                          3. build/dev.sh minimal
                                                                                          4. bin/osh.
                                                                                          5. Now you have a shell and you can even modify it with pure Python code!

                                                                                          I think that is easy, but I’m interested in feedback otherwise. And let me know if there instructions are unclear.

                                                                                          If you have a Linux machine, and are not able to do that in 2 minutes, let me know where you are at the 2 minute mark … did something not work?

                                                                                          1. 4

                                                                                            I gave this a go, but I don’t have a /bin/bash and got stuck at:

                                                                                            Removing _devbuild/gen/*
                                                                                            asdl/hnode.asdl -> (asdl/tool) -> _devbuild/gen/hnode_asdl.py
                                                                                            frontend/types.asdl -> (asdl/tool) -> _devbuild/gen/types_asdl.py
                                                                                            core/runtime.asdl -> (asdl/tool) -> _devbuild/gen/runtime_asdl.py
                                                                                            tools/find/find.asdl -> (asdl/tool) -> _devbuild/gen/find_asdl.py
                                                                                            ./build/dev.sh: build/codegen.sh: /bin/bash: bad interpreter: No such file or directory
                                                                                            

                                                                                            I do have a /bin/sh (which is dash) and I do have bash in my PATH and /usr/bin/env, so #!/usr/bin/env bash would work.

                                                                                            edit I gave it more than two minutes. I ran:

                                                                                            rg -l '#!/bin/bash' | xargs sed -i -e 's%#!/bin/bash%#!/usr/bin/env bash%'
                                                                                            

                                                                                            and got further, but found gcc was needed:

                                                                                            build/setup.py -> libc.so
                                                                                            unable to execute 'gcc': No such file or directory
                                                                                            error: command 'gcc' failed with exit status 1
                                                                                            

                                                                                            I added gcc and readline to my environment, and ran:

                                                                                            (export PATH=$(dirname $(which zcat)):$(dirname $(which git)):$(dirname $(which gcc)):$(dirname $(which bash)):$(dirname $(which coreutils)):$(dirname $(which python2)); ./build/dev.sh minimal)
                                                                                            

                                                                                            and it seems to work, and the build is super fast. Really nice!

                                                                                            1. 3

                                                                                              Excellent, the shebangs are indeed a mistake and I appreciate the followups! Thanks for trying it.

                                                                                            2. 2

                                                                                              On Arch it works as advertised.

                                                                                              1. 1

                                                                                                ‘’’Removing _devbuild/gen/* /usr/bin/env: ‘python2’: No such file or directory’’’

                                                                                                Fwiw, it’s issues like this that steer me away from Python projects. I’ve been burnt out from maintaining Python dependencies, and even though Py2 is now deprecated, it still exists. If I really need a Python project, I’ll typically try to encapsulate it within a docker container or go out of my way to make sure it only support Py3.

                                                                                                I’ve been following Oil since it feels like it’s inception, but Python has always kept me from trying it, and still does apparently.

                                                                                                1. 3

                                                                                                  OK thanks for trying it.

                                                                                                  BTW you don’t need Python (either 2 or 3) to use Oil. You only need it to change the code.

                                                                                                  The tarballs don’t depend on Python: https://www.oilshell.org/release/0.8.1/ (and that’s been true since the very first release, over 3 years ago)

                                                                                                  Or it is also packaged in a number of places: https://github.com/oilshell/oil/wiki/Oil-Deployments

                                                                                                  Also, for developers, there is a nascent shell.nix that declares the Python dep: https://github.com/oilshell/oil/blob/master/shell.nix

                                                                                                  (I don’t use Nix personally, but others may want to run with it.)

                                                                                                  Filed an issue for this: https://github.com/oilshell/oil/issues/833


                                                                                                  Also FAQ about Python: http://www.oilshell.org/blog/2018/03/04.html#faq

                                                                                              1. 3

                                                                                                Wow, I like this part:

                                                                                                There’s a famous site called Project Euler, where users write code to solve mathy problems such as “In a modified version of the board game Monopoly, on what three squares is a player most likely to land?” My former programming-contest coach advocated against using it to practice, because “It’s not really programming, and it’s not really math.”

                                                                                                I’m not going to begrudge anyone if they want to do it, but I took a look at Project Euler after seeing so many people do it, and was uninterested. Maybe because I did a lot of discrete math in high school or something …

                                                                                                Likewise I own “Elements of Programming” and also found little use for it. It’s not math and it’s not programming – great description.


                                                                                                However I definitely respect the STL and C++, and I happen to be doing more template programming than I’ve done in 20+ years right now – to write a pretty type safe garbage collector! This is the kind of thing where C++ really shines – where you need low level control, and you also want types and abstraction.

                                                                                                To me it looks like C++ and Rust are the only languages where you can do such a thing, and C++ came 25-30 years earlier. Alexandrescu proposed that “writing a garbage collector” in a language is a critical test for whether a language is a “systems language” or not, and there is some truth to that.

                                                                                                1. 2

                                                                                                  To me it looks like C++ and Rust are the only languages where you can do such a thing, and C++ came 25-30 years earlier. Alexandrescu proposed that “writing a garbage collector” in a language is a critical test for whether a language is a “systems language” or not, and there is some truth to that.

                                                                                                  I’d add Ada and Common Lisp to the list, also.

                                                                                                  In any case, I agree with you and the article author about “Elements of Programming.” I skimmed through it several years ago, and didn’t understand what the author was going for. There’s been a strong tie between math and computer science going back to before electronic computers even existed, but it seemed the author ignored all of it and came up with his own version using C++ template meta-programming to illustrate everything.

                                                                                                  1. 1

                                                                                                    Citations?

                                                                                                    1. 4

                                                                                                      Ada has generics, classes, and equivalents for the low-level functionality of C++.
                                                                                                      Address arithmetic, for example, can be done with System.Address.
                                                                                                      There is some low-level functionality that is absent from C or C++; you can specify the bit-order within bit fields. Some of Stepanov’s early work on generic programming was done with Ada.

                                                                                                      Edited to add the following text.
                                                                                                      Henry Baker wrote a 1993 paper on an Ada 83 garbage collector.

                                                                                                      History of T discusses a garbage collector written in Pre-Scheme, a Scheme subset.

                                                                                                      Edited again to add a Lisp example.

                                                                                                      How to define new intrinsics in SBCL is an example of low-level support in a recent Lisp.

                                                                                                      1. 1

                                                                                                        Thanks, although I probably didn’t make clear what I was talking about. I’m not just talking about low-level control in high level languages, although that is relevant – I’m talking static types and metaprogramming and how they help you write a garbage collector.

                                                                                                        I should have linked this previous comment where I linked to a paper by the authors/inventors of the Immix garbage collection algorithm. They implemented Immix and Rust and (surprisingly) found that its type system actually helped.

                                                                                                        https://lobste.rs/s/9duro8/rune_programming_language#c_2f90p3

                                                                                                        This is kind of a paradox because the essential nature of the garbage collector is to enforce a homogeneous view of memory on top of a heterogeneously-typed heap.

                                                                                                        If you want a quick demo, look at figure 1. It’s an abstraction over a pointer.


                                                                                                        I am finding the same thing in C++. I can implement an abstract type with storage that is a single pointer, and that is extremely helpful. v8 and SpiderMonkey do the same thing with Rooted<T>, Handle<T>, etc.

                                                                                                        https://old.reddit.com/r/ProgrammingLanguages/comments/ivyaw8/clawing_our_way_back_to_precision_spidermonkey/

                                                                                                        Apparently Rust goes even further, and you can even (paradoxically) use lifetimes to help with a GC, although I don’t have experience with that.


                                                                                                        I looked at the Cheney algorithm in es shell and in femtolisp in C, and if you compare it with mine in C++, it looks VERY different. For one, you don’t have to push and pop variables to maintain precise stack roots – I just use a pointer-like type Local<T>. (I don’t think Ada can do this – looks like it has basic C-style address arithmetic) Again this is what all the JS VMs that we use do.

                                                                                                        And I also like these types:

                                                                                                        https://github.com/oilshell/oil/blob/master/mycpp/gc_heap.cc#L13

                                                                                                        You can sorta do that in C, although the inheritance helps too. The explicit reinterpret_cast vs. static_cast in C++ helps a lot too.


                                                                                                        Cool trick I just learned with offsetof and constexpr: https://old.reddit.com/r/cpp_questions/comments/iz84pb/how_to_portably_calculate_field_offset_bitmasks/

                                                                                                        Key point: the layout depends on compiler flags, so you can’t just generate field bitmasks. Introspection with the compiler is useful!


                                                                                                        I should probably write a blog post like “C++ features that help you write a garbage collector”:

                                                                                                        • pointer-like types, and templates
                                                                                                        • inheritance (e.g. from Obj, for object headers) – although I needed to use C-style macros for a different reason
                                                                                                        • more precise casting
                                                                                                        • offsetof and constexpr

                                                                                                        Though the part in the History of T and locality of copying collection is pretty interesting (BFS vs DFS)! I’ll have to look into that more as it’s an issue I’ve been wonderiing about.

                                                                                                      2. 1

                                                                                                        I’m not sure what you want a citation for? Are you asking for garbage collectors written in Ada and Common Lisp?

                                                                                                        I don’t know of any off hand, but my intention was to point out that Ada and Common Lisp are two other languages offering high level abstractions but also low level control. I suppose I should have quoted the sentence above that paragraph.

                                                                                                        1. 1

                                                                                                          See my sibling response: https://lobste.rs/s/bqnhbo/book_review_elements_programmnig#c_l7fnoh

                                                                                                          The Rust paper is basically what I was talking about; the same is true in C++ but not other languages

                                                                                                  1. 2

                                                                                                    The title is a little misleading as Oil is yet another shell, so it’s not so much “improving a Bash script using Oil” as it is “replacing a Bash script with an Oil script”.

                                                                                                    1. 2

                                                                                                      That’s true, but you can also use Oil just to improve a bash script… just parsing or running it with Oil may make the script more comprehensible, and less reliant on the vagaries of bash (which do change from version to version). Example with a Lisp in bash:

                                                                                                      http://www.oilshell.org/blog/2020/06/release-0.8.pre6.html#patch-to-run-the-mal-lisp

                                                                                                    1. 6

                                                                                                      BTW I’ve added Ruby-like blocks to Oil, which I think can be used to replace YAML in many circumstances. Shell is a natural place to put configuration because shell starts processes.

                                                                                                      I didn’t document it fully but there’s a snippet here:

                                                                                                      https://www.oilshell.org/release/0.8.1/doc/oil-proc-func-block.html#configuration-files

                                                                                                      It looks a lot like HCL. Just like you have a block like

                                                                                                      cd / {
                                                                                                        echo $PWD
                                                                                                        ls
                                                                                                      }
                                                                                                      

                                                                                                      in Oil, you can also have

                                                                                                      server foo {
                                                                                                        port = 80
                                                                                                        root = '/'
                                                                                                        section bar {
                                                                                                          ...
                                                                                                        }
                                                                                                      }
                                                                                                      

                                                                                                      It still needs a user-facing API and I’m looking for help/feedback on that!

                                                                                                      1. 2

                                                                                                        This looks like Nix’s attribute set syntax as well which is super nice to work with! Nice