1. 15
  1. 6

    Wow I just started reading this, and just from the first few pages, I see a lot of useful info and citations, and very similar motivations to http://www.oilshell.org/ .

    1. They are focused on translating shell to another language. That is also my plan [1] (although there is some nuance that I haven’t published on the blog; in short, I may merge the two parsers for OSH and Oil)

    2. They are focused on the Debian maintainer scripts. One of my main goals for Oil is also the scripts to build various Linux distributions [2].

    One difference here is that the Debian policy manual limits their maintainer scripts to POSIX + a handful of things like local. Basically, that’s dash. Oil is solving a harder problem in understanding all of bash.

    1. They statically parsed this Debian corpus of 31K shell scripts


    I am also statically parsing a large corpus of shell scripts, not limited to the Debian dialect:


    Mine is 11,000+ shell scripts of 1.1M lines: http://www.oilshell.org/release/0.2.0/test/wild.wwz/

    (I planned to also parse shell scripts inside packages, but I reached the point of diminishing returns as far as dredging up bugs.)

    Key point: in order to meet the first goal, automatic translation, you need static parsing!

    1. Formal verification isn’t a goal of Oil (at least not any time in the near future). However I have produced a compact description of the abstract syntax of shell, as well as a compact interpreter for it in a high level language.

    osh.asdl is around 200 lines of ML-style type definitions, and the _Dispatch loop in the interpreter is around 400 lines of very high level code.



    This is what makes the shell very slow right now. I’m pretty happy with this as a high level description of the language, but the immediate plan is to optimize it (add more detail to the code), not prove anything about it.

    As I mentioned to Nick, shell is all about string manipulation. And using Python-style strings-as-values is a lot closer to mathematics than C-style strings-as-buffers. And I’m using ML-style data structures, which are basically the structures that people use to prove things. I haven’t done much with formal methods honestly, but there is a similarity there.

    I’ll have to read the rest of the paper to understand what properties they are looking to prove. I have thought about some properties I can guarantee via a bytecode VM (e.g. does a function do I/O), but that is a long topic, and still vaporware, so I won’t talk about it too much.

    Thanks a lot for this link! I will be reading more of it in the near future.

    [1] Translating Shell to Oil – http://www.oilshell.org/blog/tags.html?tag=osh-to-oil#osh-to-oil

    [2] Roadmap #5 – http://www.oilshell.org/blog/2017/10/06.html

    1. 3

      Glad you liked your late Christmas present. :) I’ll read some more of this to try to answer other stuff when I have time. About to be off to work. Also, I only skimmed the paper since it was one of about 50+ I went through last month. I planned to read it fully this January. One thing I can guess at is original language vs formal semantics of new one since reasons these are done are pretty consistent across research groups.

      One problem with languages like Shell or C is they’re not designed with formal semantics. They have no precise meaning. These ambiguities do at least two things: make the same program behave differently in different implementations; make program analysis or optimization more difficult. A formal semantics gives you predictability with potentially better safety and optimization. Optimization is hit or miss but you can always do better static analysis when there’s a precise, static meaning of the program. That can also facilitate better dynamic analysis, spec-based testing, SPARK-style proving, transformation/refactoring… everything… just by increasing precision plus reducing exploration of paths or states that clearly don’t matter.

      Now, shell language is pretty hard to formalize. I had a previous submission I can’t find right now where people tried to do that to the real language (maybe bash). They got… a little of the way there. Like with C similarly not designed for analysis, a common technique is to dodge the problem by making either a usable subset or a similar language amendable to verification. For C, it was languages like C0 used in Verisoft or Simpl/C in seL4. In this case, they’re similarly trying to make a better language to work with which will allow translations from shell to their language. You were to but just not for verification part.

      This brings up yet more issues. The first is making sure they’re correctly modeling shell for that translation. That’s usually done heuristically with lots of testing as you were describing. Note that it’s not worthless duplication because we have two jobs here: modeling good enough for a straight-forward translation; modeling good enough for static/formal analysis, optimization, or compilation/execution. The latter might need a good semantics on the kinds of tools people do those jobs with. The WhyML they’re using is a good example since it’s what things like Frama-C use for safety analysis. It discharges to multiple provers that automate as much as possible. The first semantics or translation rules could be in a totally different tool, be less formal, leverage a person eyeballing the output just in case, use fuzzers for equivalence tests, and so on. You’re correct to mainly focus on corner cases since the obvious stuff is easy to write translation rules for.

      So, these attempts are a one-two punch where the first punch is getting some analysis done on shell scripts by improving their language. The second punch is getting it all done either with legacy language itself formalized or verified translation from it to similar language. Many benefits, not just verification, can follow from the new, precise scripts/apps.

      1. 3

        Here is a followup comment I wrote after reading the paper:


        In short, the companion paper about parsing shell is very very close to what I’m doing with Oil.

        This paper is about what you do with the parsed representation. In their case, it’s about the formal semantics of an intermediate language. In contrast, what I’m doing with the parsed representation is executing it and testing the semantics against existing shells with spec tests [1].

        Honestly, I don’t understand what the soundness and completeness proofs really get you. What does that mean? That every program has a meaning? There seems to be some element of begging the question, because the language you’re proving this about is not shell. And they admit that you somehow need to verify the “similarity” or correspondence of this language to shell, but that is not done.

        In general I am a fan of this line of research. It has practical goals and is data-driven. (For example, the Mancoosi project cited, which I read about a few years ago.)

        This paper is not very practical, but the parsing paper is very good. I’ll have to dig into their parsing algorithms a bit more. They’re not using the same algorithms as Oil.

        [1] How I Use Tests – http://www.oilshell.org/blog/2017/06/22.html

        1. 3

          Just as a note to myself, the thing that left my scratching my head was the testing methodology. To me the formalism isn’t interesting if it doesn’t correspond with reality. So I would like to see something like this:

          Python: The Full Monty A Tested Semantics for the Python Programming Language


          Python comes with an extensive test suite. Unfortunately, this suite depends on numerous advanced features, and as such was useless as we were building up the semantics. We therefore went through the test suite files included with CPython, April 2012,9 and ported a representative suite of 205 tests (2600 LOC). In our selection of tests, we focused on orthogonality and subtle corner-cases. The distribution of those tests across features is reported in figure 12. On all these tests we obtain the same results as CPython.

          It would be more convincing to eventually handle all of Python’s own unittest infrastructure to run CPython’s test suite unchanged. The unittest framework of CPython unfortunately relies on a number of reflective features on modules, and on native libraries, that we don’t yet cover. For now, we manually move the assertions to simpler if-based tests, which also run under CPython, to check conformance

          But I think this paper is a stepping stone and the future work looks very promising.