1. 50
  1.  

  2. 6

    @andyc Congratulations! One of the main reasons I stopped using osh was because of the performance (especially for tab-completion), so it’s great to hear that’s mostly been solved. I might try osh again soon :)

    I do hope that OSH gets better support for job control, when I stopped contributing it seemed like I was the only one using it, so no one noticed when it broke. There’s also quite a few issues that have been open for a while. I know it’s hairy but I do find it really useful.

    RE “writing it in Rust would be too much boilerplate” - I’m actually currently in the process of rewriting my parser in Rust, so I’ll let you know how it goes. I plan to use pratt parsing, not recursive descent, which should cut down on the amount of code: so far the most boilerplate by far has been the pretty printing (about 200 lines of code that could probably have been autogenerated). I think this would have been similarly long in any language, although I challenge others to prove me wrong. This of course will change as I make more progress with the parser. Right now I’ve only implemented binary expressions and postfix operators, but the hardest bit is parsing typedefs and function pointer declarations.

    1. 2

      I plan to use pratt parsing

      I agree pratt parsing being nice for expressions; but is it any better than plain recursive descent for statements?

      but the hardest bit is parsing typedefs and function pointer declarations

      I struggled with that, too. The Right Left Rule might be useful for you: http://cseweb.ucsd.edu/~ricko/rt_lt.rule.html

      1. 1

        Yes I’m optimistic Oil will be fast. So far we’ve translated 16K lines of code but that doesn’t include tab completion at the moment. For a variety of reasons like using yield that might be the last thing translated, but we can talk about it.

        I remember you had a problem with job control but I can’t find the bug right now. I know there are some other unresolved bugs like:

        https://github.com/oilshell/oil/issues/500

        Baisc job control works for me – I just tested the latest 0.8.pre2 release with vim -> Ctrl-Z -> fg. But there are other hairy parts that aren’t implemented, and probably won’t be without help because I’m a tmux user :-/ But I also may not have encouraged help there because I knew we were going to translate to C++. The code right now is best viewed as a prototype for a production quality shell. I expected it will be 3-5K lines of hand-written C++ and 30-50K lines of translated C++.


        We can talk about it on Zulip maybe but I don’t think pratt parsing is great for most full languages like Python or JS, only for “operator grammar” subset with precedence. Even despite writing a whole series on it!

        http://www.oilshell.org/blog/2017/03/31.html

        If the language is “normal” I don’t think Rust is a bad idea – after all plenty of parsers are written in Rust. Shell is an especially large language syntactically. It’s much smaller than Python and C++ in general, but the syntax is much larger and requires a big parser.

      2. 5

        What was the reason of choosing c++ over a fast memory safe language, be it traditional language like Go or something completely different like Rust?

        1. 35

          Oil is memory safe by construction … that was one of the main points of writing it in a collection of high level DSLs!

          Manually written C++ is “kinda” memory safe but not really, but that’s not what Oil is.

          http://www.oilshell.org/blog/2019/06/17.html#why-is-it-written-in-python (I should change this to “DSLs”, because it’s written in Python in one important sense, but that description is misleading in another important sense)

          For example, it doesn’t use strings-as-buffers as in C; it uses strings-as-values as in Python. (And it doesn’t use STL’s string class.) That’s a huge difference that you’ll see all over the code.

          Despite that it’s actually faster than bash and zsh which are written in very low level C! I call bash and zsh “grovelling through backslashes and braces one by one”, and Oil doesn’t do that.

          Oil is partly a shell and partly a software engineering experiment in “metaprogramming” – i.e. can you implement code with domain-specific, high level abstractions and still get good performance?

          This post is probably 20-40% of the victory lap. I hope to have 80% by the end of the year – i.e. a working shell in pure native code, that’s measurably better than existing shells along several important dimensions.

          If the project becomes self-sustaining (which is a big if), I will have time to optimize it in ways that are impossible if it were written by hand in C++, Rust, or Go. The abstractions are “right” – they don’t express irrelevant detail and give you more leeway in implementing them.

          Honestly I think Rust (and manually written C++) are at the wrong abstraction level for writing a 10K line recursive descent parser. They would end up at more like 20 - 30K lines due to all the irrelevant details you need to take care of. (note that bash is a 140K line C program.)

          However I encourage others to pursue that direction:

          http://www.oilshell.org/blog/2020/02/recap.html#posts-i-still-want-to-write


          If you look at this listing it may give you a sense of how Oil is architected now. The three biggest files are the 3 biggest DSLs.

          https://www.oilshell.org/release/0.8.pre2/metrics.wwz/line-counts/oil-cpp.txt

          Oil isn’t written or Python or C++, or Rust or Go. It’s written in “OPy/mycpp”, ASDL, and an abstract regex dialect (actually “Eggex” if I self host.)

          • the 26K line osh-lex file is all DFAs generated from regexes.
          • the 16K line osh_eval file is the parsers and evaluators translated from statically typed Python. This will be more like 25K when the translation process is done.
          • the *_asdl files are language-independent type definitions expanded to C++ (and the types can be pretty printed)

          Some more color: Oil is made of ideas; it’s not a big pile of code in a particular language

          https://oilshell.zulipchat.com/#narrow/stream/121540-oil-discuss/topic/Oil.20is.20made.20of.20ideas.3B.20it's.20not.20a.20pile.20of.20code.20.2E.2E.2E (requires login)


          edit: also the reason to generate C++ and not Rust is basically to have fewer build deps and runtime deps (and that there’s no safety advantage to the latter). Note that libc is a required runtime dep of a shell; it’s closely tied to libc and the kernel.

          Embedded systems are often Linux systems these days, which need a shell. And I want Oil to be usable on all CPU architectures, not just “first class” Rust or Go ones, etc.

          1. 15

            Here’s one example where using domain-specific abstractions is more expressive than Rust. Rust doesn’t have “first class variants” as I call them:

            https://github.com/rust-lang/rfcs/pull/2593

            This RFC is not on the roadmap for the language team for the moment. We will consider it at earliest in 2020.

            But I added them to ASDL with ease. I use them all over the AST and it’s a more natural expression of the language, and generates more compact C++ types:

            https://oilshell.zulipchat.com/#narrow/stream/121539-oil-dev/topic/Shared.20Variant.20Refactoring (requires login)

            One thing I’ve learned from this project is that algebraic data types have a lot of nontrivial design decisions. And you pretty often run into limitations of them in bigger programs.

            Another example from Haskell (and I think Scala has similar things, etc.): https://stackoverflow.com/questions/28244136/what-is-scrap-your-boilerplate

            That’s a conflict between types and metaprogramming as I view it. Oil prefers to leverage metaprogramming.

            http://www.oilshell.org/blog/tags.html?tag=metaprogramming#metaprogramming

            1. 12

              Also on Go, you can’t write a POSIX shell in portable Go, basically due to it’s threaded runtime:

              https://lobste.rs/s/hj3np3/mvdan_sh_posix_shell_go#c_qszuer

              related: https://lobste.rs/s/6a6zne/some_reasons_for_go_not_make_system_calls

              In contrast, generating C++ means that the code is bog-standard and essentially 100% portable. What a shell expects from the operating system is extremely old and standardized by POSIX.

              Although this doesn’t mean nobody should try writing Oil in Go, if you’re willing to deal with portability issues.

              (Sorry for the deluge of replies, but I expect these questions to come up more in the future, so this is a good place to draft answers to them all :-) )

            2. 2

              This is impressive work. I’d love to find out more about how you approached using Python to metaprogram C++. If you talk about that anywhere, links would be much appreciated. And if not, just know you have an eager audience should you decide to!

              1. 2

                I was writing a lot about this early in the project, but I dropped it for lack of time. The three main code generators are ASDL, re2c, and now mycpp, but I haven’t written anything about mycpp.

                Here is some portions:

                http://www.oilshell.org/blog/2019/12/22.html#appendix-a-oils-lexer-uses-two-stages-of-code-generation

                http://www.oilshell.org/blog/tags.html?tag=ASDL#ASDL

                Your best bet is read Zulip, I post details there, and feel free to ask questions. There are a ton of threads about mycpp and the translation process, which I may link in the next post.

                https://oilshell.zulipchat.com/#narrow/stream/121539-oil-dev/topic/Naming.20convention.20for.20metaprogramming (requires login)

                As mentioned in the January blog posts, I’m cutting a lot out of the project because it got too big. But hopefully it will get reasonably done and then I can write more about it.

                1. 1

                  BTW I wrote a blog post that links some relevant threads. Feel free to ask questions on Zulip!

                  http://www.oilshell.org/blog/2020/03/recap.html#mycpp-the-good-the-bad-and-the-ugly