1. 24
  1. 2

    I can’t remember if this is the one I submitted a while back that @andyc had good commentary on. Lobsters, DDG, and Startpage are giving me nothing for some reason. Andy, do you have a link to that Lobsters thread?

      1. 6

        Yup, that’s it. There are currently two different academic projects about shell:

        1. This submission, Smoosh, by Greenberg et. al. I’ve been having some nice discussions with Greenberg about shell. He reported a good bug in OSH, and got me interested in running some more test suites.
        2. That one from a group in France. Morbig is a static POSIX shell parser (using a grammar and Menhir), and Colis is a shell-like language that you can translate certain shell scripts to.
        • In contrast, Smoosh uses the dash parser via “libdash”, i.e. its syntax is abstract rather than concrete.
        • In contrast, OSH’s parser is hand-written (although the lexer is generated and does a lot of the heavy lifting).

        I haven’t been in contact with group #2, I gave feedback on this paper (#1) last week. I was impressed by the empirical evaluation, and it got me interested in the other test suites, as mentioned. I also found the written POSIX spec to be somewhat incomplete / underspecified, so it’s nice to have this executable semantics to use an oracle.

        My main quibble was that I believe “word expansion” (string-based rewriting in stages) is essentially an implementation detail, and not a fundamental feature of the shell language, or what makes it good for anything in particular.

        I prefer to think of shell as a “normal” programming language, except that it has several sublanguages. I wrote this wiki page after reading the paper, and sent it to Greenberg:


        Basically in OSH, command / word / arith / bool are mutually recursive sublanguages, each with their own parser and evaluator. I don’t think there needs to be a notion of “expansion” separate from evaluation. For example, in the Oil language, there won’t be splitting stage (use arrays instead), and globs will be statically parsed.

        Basically, I think splitting, dynamic globbing, and dynamic arithmetic are all mistakes. They’re the source of all the advice to quote everything that we’ve had to drill into every new shell programmer’s head for decades. Quoting inhibits some parts of the expansion pipeline.

        Try !qefs in #bash on Freenode. That gives you:

        "$Quote" "$Every" "$Fucking" "$Substitution"

        (I googled that and got one of my own blog posts back :) https://www.oilshell.org/blog/2017/02/26.html )

        1. 1

          I’m really glad that folks are interested in this work! :)

          Both CoLiS and Smoosh use abstract syntax, ignoring, e.g., unquoted whitespace between tokens and other vagaries of their concrete inputs. CoLiS’s abstract syntax is much smaller than Smoosh’s. CoLiS really has two levels of abstract syntax tree, too: they get one out of the grammar (which is more or less exactly the grammar in the spec) and then another as the target of their translation. And if we’re going to be honest, Smoosh has two grammars as well: there’s the AST that the dash parser produces and then a separate AST (in Lem/OCaml) that I translate it to.

          In any case, I’m hopeful that Smoosh can be baseline for, e.g., a proof of CoLiS’s soundness.

          I think Andy and I are in agreement that the terminology “expansion” vs. “evaluation” is kind of moot… there’s some process that takes words and produces a list of strings (a/k/a fields). Shells written in C tend to write that process as an expansion: given an array of characters and control codes, expand that array, updating it left-to-right. That particular implementation strategy is efficient, but I agree with Andy that focusing on that strategy is a distraction. (By contrast, Smoosh does multiple passes over an AST representing words.) OSH, Modernish, and other opinions I’ve heard all seem to point to folks being very suspicious of field splitting and globbing. One test I’m interested in running is: how many existing scripts break if you only do field splitting inside of the iteratees a for loop?

          1. 2

            Yes I remember that one of their goals was to use the POSIX grammar literally, despite it missing a bunch of information, which was interesting.

            Another common pattern for splitting is something like:

            if test -n "$have_readline"; then
              FLAGS='-l readine'  # 2 args
            cc ... $FLAGS ...   # invoke compiler with optional args

            I used that somewhere when I wanted POSIX compatibility. (I thought it was in the configure script, but I can’t find it now). Arrays are a pain in the butt to use correctly, and in bash 4.3 and below (quite recent) there’s even a bug that confused empty arrays with unset arrays.


            set -u
            cc foo "${empty[@]}" bar    # would fail, but doesn't anymore!

            I think the strategy of OSH is going to define new options, e.g. shopt -s strict-* and shopt -s oil-*, to gradually transform shell into another language.

            The abstractions should be more appropriate for the domain, and it should have types appropriate to the domain. That is, I think the emphasis on strings is mostly a mistake / issue of expediency. R, Matlab/Julia, and Python are used interactively and have rich (dynamic) types.

            I think it can be done incrementally too! I sort of changed strategies from what I wrote a couple years ago. Originally I was envisioning a entirely parallel and clean Oil language that could be translated to. But now I realize that that strategy is probably too much work. So having these global execution options is an easy way to do it gradually. (Although I should come up with a way to make these scoped or limited to a given file, as you mentioned Modernish does).

            I started these very rough docs. At some point I will circulate them more widely for feedback!



            One issue I didn’t mention is “word elision”, which Modernish makes some good points about. You can turn of splitting with IFS='', but you can’t turn off elision of $empty.

        2. 2

          Yup! Thanks! I’ll save it so it doesn’t happen again.