1. 44
  1.  

  2. 9

    The oilshell description says:

    It’s written in Python, so the code is short and easy to change.

    But if I count the lines of code, I got: 480 654 of Python + 286 729 of C + 69 666 C/C++ headers + other languages (1 055 154 in total, everything without comments and blank lines, just code).

    Numbers for Bash: 105 729 of C + 7 617 C/C++ headers + other languages (363 044 in total, just the code).

    I have not investigated why it is so different. Maybe you have some generated code or lot of tests or „vendored“ some dependencies (and their counterparts are not counted in the Bash numbers)…

    So, what are the actual numbers?

    1. 13

      This is published at the bottom of most release announcements.

      This is the stuff you change! It’s important that this is small – the generated code stats below are less important.


      The CPython is going away in favor of oil-native:

      This is basically the 17K/32K lines translated, plus a bunch of generated code.


      I measured bash at 101K significant lines and 142K physical lines (even removing their generated yacc code, etc.), so Oil is multiple times smaller: http://www.oilshell.org/blog/2019/06/17.html#why-is-it-written-in-python

      Even it were not smaller, I challenge anyone to produce a patch for command_sub_errexit or process_sub_fail for bash, described in the blog post :) Not saying nobody can do it, since I noticed dgsh forked bash, but the code is very difficult.

      Those two features would be great additions to bash. Not only does bash not fail if process sub fails, it doesn’t even wait() on the process! Example:

      diff <(sort left.txt) <(sort /oops/error)
      echo 'I do not want to get here in bash'
      

      Or even simpler is:

      echo $(date %x)
      local d=$(date %x)
      echo 'I do not want to get here'
      
      1. 3

        FWIW the code is clean and well commented (the good kinds of comments). Even the bash code is prettier than I thought possible!

        1. 3

          First, I must say that I really appreciate and support attempts to write new simpler and cleaner implementations. The complexity is one of biggest issues in our industry and it should be reduced. Using safer languages is also important.

          ~17K significant lines in the Oil interpreter

          I see that you count only some files and omit others. Yes, as the author who wrote that, you can do this, because you know, which files are important and which can be skipped. But from someone else’s point of view, every file in your repository is important and should be counted. If someone e.g. wants to do a security audit of your code, he must check all files in your repository – not just the „important“ ones from your list.

          The oil/Python-2.7.13/ directory looks bit scary. If you remove this and depend on standard Python provided by the user (or his operating system), it would improve the results much. Then it is just 230 933 LOC in Oil vs. 363 044 LOC in Bash (however, Bash has 165 827 LOC of PO files – localization).

          If you would have a repository that contains just that ~17 000 LOC, has well-defined dependencies and builds, then it would be really interesting.

          1. 1

            BTW the README describes the repository layout, in order to help contributors figure out what’s important:

            https://github.com/oilshell/oil/blob/master/README.md

            It sounds like you’re approaching this from a very theoretical angle. If you’re interested in changing either Oil, bash, or any other project, I would:

            1. Figure out how to build them (which I try to make easy in Oil’s case, and which I will help people with)
            2. Figure out the places in the code that are relevant (I use grep, but ask me for tips).
            3. Write a failing test for what you want to add, change the code, and see what happens.
            4. Repeat

            That’s how any substantive project that anybody cares about is developed. The tests may or may not be automated, but Oil’s tests are all automated, which will help you.

            Nonetheless I strive to make the whole project globally coherent, which will reveal itself over time to active contributors. There’s no hope of understanding say bash, LLVM, Apache, Linux, git, Oil, etc. all at once. You have to live in the code for awhile.

            That is, figure out something you want to do first. If you don’t like the code, that’s fine, but the point is you’re going to find a different set of problems in any other project.

            Oil’s code is definitely unusual, toward the goal of being short, globally coherent, and having fewer bugs. It’s more like an executable spec (that contains enough information to be fast). Other projects have simpler code that is more repetitive.

          2. [Comment removed by author]

            1. 3

              I think GP’s point is that bash should abort before those lines get printed, but doesn’t.

              1. 3

                This is questionable. Whether such behavior is „right“ or „a bug“ depends on whether it fulfills the contract described in the specification/documentation.

                Does Bash implementation deviate from the specification/documentation? If yes, then it is a bug and should be fixed. If not, then it is correct or undefined behavior and different people can have different opinions or wishes how it should work – but it is not a bug.

                The <(…) opens a file descriptor and passes a string /dev/fd/… to the child process as a CLI argument at given position (try echo abc <(uname) def). It is just a pipe between two processes. The fact that one has ended with non-zero status, IMHO does not automatically mean that everything is wrong and should fail. If you e.g. do <(grep something somewhere.txt) the grep process might end with non-zero status, but is it a failure that should stop whole script? Or is it just an empty result (stream of no lines) and legitimate state?

          3. 1

            Numbers for Bash: 105 729 of C + 7 617 C/C++ headers + other languages (363 044 in total, just the code).

            And bash itself is bloated. Refer to mksh.

          4. 4

            Keep up the great work!

            1. 3

              I know Oil is not POSIX-compliant, and it doesn’t have to be, but I think a nice feature of such shells is a mode to parse POSIX shell commands. That will be helpful for utilities that requires an init script in .bashrc, e.g. “conda init”, since developers typically assume a POSIX shell.

              1. 10

                Oil absolutely is POSIX compliant! Everything is opt in.

                In fact it’s measurably more POSIX compliant than dash, and it should beat bash pretty soon: https://www.oilshell.org/release/0.8.2/test/spec.wwz/survey/smoosh.html


                You can run bin/osh on your POSIX shell scripts right now. I have tried to make the opt in nature very clear in the docs like: https://www.oilshell.org/release/0.8.2/doc/deprecations.html

                If there are any places in the docs where this needs to be made clearer, let me know.


                Oil runs the shell output by virtualenv (and has since the beginning of 2019). conda init would a great thing to test! Let me know what happens :)

                1. 4

                  Oh, it is? I must have confused OSH with the Oil language. XD

                  1. 2

                    Yeah maybe it’s because the slogan is “our upgrade path from bash” ? I could change it to “our upgrade path from bash and POSIX shell”, and that would still be accurate. Although a little more clunky.

              2. 1

                It will never get traction because it’s not written in rust and the kids these days want either Javascript or rust. I mean how insecure, dangerous and hard is manual memory management, how dare you use a pointer? Can’t have that folks, and we can’t npm install it in our container enabled blockchain cloud.

                (That was a joke, I’m becoming an old curmudgeon. I love to look at the c++ code for oil, can learn a thing or two from it).

                1. 5

                  The nice thing is that Oil is memory safe by construction :) That’s one reason it’s written in high level DSLs.

                  That is, you can’t even express memory unsafe operations in Python!

                  So there’s about 83K lines of C++ code in the tarball, and of that, and maybe 2K lines that are hand-written. (And I run the unit tests of those 2K lines with ASAN, which has been true since the very beginning.)

                  The important thing is that the 2K-5K lines is basically constant, no matter how big Oil gets. It’s probably 100x easier to audit for any purpose than say bash or zsh!

                  (And not that I think it really matters, but I would be interested if the amount of unsafe code linked into an average Rust program of comparable size is less than 2K or 5K lines. Back when Rust used dlmalloc(), I think that was false. I think there is very little runtime today, except maybe when you use async, etc.)


                  I should add that I think it would be very cool to have a version of OSH in Rust, which would avoid the garbage collector (which I haven’t integrated yet). I don’t necessarily think GC is a problem, since essentially every language uses it, but it would be technically possible.

                  One thing I glossed over is that the OSH language doesn’t require a garbage collector, but the Oil language does! That’s because Oil has recursive data structures.

                  sh, bash and awk do not have recursive data structures, and you cannot pass or return compound data structures to or from functions. So they do not have garbage collectors.

                  Remember the logic has been “compressed” to 17K significant lines, so this is not impossible :) Also as mentioned, someone is already translating the code to Nim.

                  http://www.oilshell.org/blog/2020/07/blog-roadmap.html#how-to-rewrite-oil-in-nim-c-d-or-rust-or-c

                  1. 2

                    That is, you can’t even express memory unsafe operations in Python

                    This is… Not true. There are the more well known escape hatches like ctypes, but also you can express unsafe operations involving codecs, by manually constructing bytecode, …

                    1. 3

                      Sure, well I just meant that Oil use plain Python functions, classes, dicts, lists, and strings to express algorithms. There is no unsafety there.

                      But yes CPython has a bunch of holes like that… in fact I remember Guido’s first job on the App Engine team circa ~2007 was to patch the regex hole. You could bypass the regex parser/compiler, construct your own regex IR, and crash CPython :)


                      edit: I guess another way to say it is that Jython and IronPython are both “Python”, but they don’t need to contain memory unsafety… (and don’t contain any AFAIK). Those are CPython bugs/details essentially. You can implement something reasonably called “Python” without those unsafe operations.

                      I could have said “OPy” too, which is the subset of Python/MyPy that Oil’s written in.

                      1. 1

                        Yep, that’s fair. But you’ve used a big chunk of the CPython implementation :)

                        1. 2

                          CPython is going away! See this comment and the full listing of the oil-native tarball there: https://lobste.rs/s/qfiki1/four_features_justify_new_unix_shell#c_ncyhoh

                          1. 1

                            Oh! Does this mean that you managed to translate all the Python to C++ in a way that you were happy with?

                            1. 3

                              Yup, I wrote a tool called mycpp to do that:

                              https://news.ycombinator.com/item?id=24845983

                              It’s very good from an engineering perspective, as the speed is better than I expected.

                              From an aesthetic perspective, I definitely abused the heck out of Python syntax … Hence proposing the “Tea Language” in some of those links (which is basically statically typed Python with sum types and metaprogramming, which someone else asked for)

                              Another related comment: https://lobste.rs/s/4hx42h/assorted_thoughts_on_zig_rust#c_mqpg6e

                              1. 2

                                I got bored this evening and wrote a crappy not even a little bit baked version of that: https://twitter.com/tekknolagi/status/1319837940126679045

                                1. 2

                                  Ha awesome! Yes that’s basically what it is. Although are you using a typed AST, or I think Python 3’s stdlib has an AST with the unchecked expressions?

                                  Though the funny thing is that I’m going back to curly braces for Tea :)

                                  https://github.com/oilshell/oil/blob/master/oil_lang/testdata/data-enum.tea

                                  I do like Python’s indented syntax, but interpreters/compilers have huge switch statements and huge functions, and I think the curly brace syntax works better there. And for type declarations like data/enum.

                                  Basically it’s a language that can be pretty-printed directly to C++. I step through that C++ in the debugger, and use C++ profilers on it. It looks like a cross between Rust, Go, and Swift.

                                  I even wrote a GDB plugin to print the algebraic data types, which automates the act of looking at the runtime type tag, and then “casting” it in the debugger to the appropriate variant.

                                  https://github.com/oilshell/oil/blob/master/devtools/oil_gdb.py


                                  I noticed your bio says that you’re looking for a language that doesn’t suck. Statically typed Python with sum types, metaprogramming, (and curly braces) is a pretty good language :)

                                  Sales pitch:

                                  • high level (an application language, not a system language)
                                  • low dependency (spits out a single C++ translation unit)
                                  • high performance (leveraging C++)
                                  • situated – the Rich Hickey argument that Clojure leverages the JVM. Tea leverages the entire C and C++ ecosystems.

                                  I guess it’s similar to Haxe and Vala, two fairly obscure languages. But I would say the initial use cases would probably take advantage of the “low dependency” part – package managers, build systems, dev tools. And all of those apps have some distributed systems aspect, which I think makes it interesting.

                                  Oil is still supposed to be a language for distributed systems, though that part is long delayed [1], and we need a high performance language as well as a shell. Oil + Tea is like shell + C :)

                                  [1] http://www.oilshell.org/blog/2017/01/19.html

                                  1. 1

                                    I am using ast and only using the types for textual annotations, that get converted literally to C types.

                                    Though the funny thing is that I’m going back to curly braces for Tea

                                    Probably for the best. I like my braces.

                                    I noticed your bio says that you’re looking for a language that doesn’t suck.

                                    It’s a little misleading because I am looking for a language, runtime, and ecosystem that all don’t suck! I am interested in trying out your language Tea, sure.

                                    The closest thing I have found to a language that I like is Rust, though it suffers from high compile times (partially due to throwing a lot of code at LLVM, partially due to using LLVM, and probably some other stuff too), among a couple of other ergonomic complaints I have. Next is either C++ or Java. Zig might win me over if I look at it longer.

                                    I’m just… curmudgeonly.

                                    1. 2

                                      If you want an ecoystem that doesn’t suck, I think something like Tea is the best way to get there! That’s one of the main rationales: C and C++ still have the best ecosystems.

                                      If you want to use new Linux system calls (io_uring?), connect with GPUs, write auto-vectorized code, use intrinsics, etc. C and C++ are still the best places. It also has time tested libraries like Cairo graphics, libcurl, WebKit, multiple JS interpreters, etc.

                                      I want to solve / skirt the “binding problem” in other words. A Tea program produces a C++ program. Some of the stuff Rust is doing with bindings is a little gross.

                                      Rich Hickey’s HOPL IV paper is gold on that matter, describing how Clojure leverages the JVM.

                                      https://download.clojure.org/papers/clojure-hopl-iv-final.pdf

                                      Clojure was designed to get work done! Not to rebuild shit for like 20 years. Likewise for Tea.


                                      That said, the Tea runtime / GC could definitely be called impoverished, but that can change over time.

                                      Tea doesn’t really exist except for the grammar, I’m looking for help … right now to convert the parse tree to an AST, and later to write a typechecker!

                                      The last blog post says I’m cutting the Oil language for now, so honestly the Tea language should be cut before Oil :) It’s definitely a case of scope creep, although I think it is a good idea nonetheless. (And it’s not a 10 year project; it has a concrete feature set.)