1. 11
  1.  

  2. 3

    I don’t care about Oil or OVM since I haven’t needed to learn them. I still enjoy skimming these blog posts since you’re so analytical, thorough, and well-documenting with the project. Keep up the interesting work. :)

    1. 2

      Well nobody has learned them, because I haven’t released anything yet :)

      The idea is to trick you into learning it… OSH is just like bash, so for many there’s nothing to learn. When you inevitably get pissed off by bash programming, you can auto-translate the whole thing to Oil. Then you learn Oil.

      The shell parts of Oil will be trivial to learn – it’s a nicer syntax with the same semantics. The extensions like Awk and Make functionality might take some learning, but it will be worth it.

      That said, OSH will be more for scripting than the interactive shell at first. Probably only 10% of bash users do any scripting – the rest use it only interactively.

    2. 2

      I don’t quite understand what the issue with python 3’s unicode was. My impression is there’s an issue with Py3’s IO handling (specifically, trying to get a bytestream of stdout and co.), but the underlying “string”/“byte” distinction feels right.

      OSH should also be smaller than bash, which is ~150K lines. Right now my slice of CPython has ~135K lines of C. However, the ~12K lines of Python in OSH, combined with their Python standard library dependencies, unfortunately makes it bigger than bash.

      Kind of amazing that this slice of CPython + stdlib roughly equals bash in LOC, which does… like 3 things?

      1. 1

        Bash is indeed big, but it does do a fair amount. There’s a big, full-featured, but messy language. The parser alone is more than 5K lines of code.

        There is a lot of compatibility cruft in bash though. For example, options like set -o compat42 and the like.

        As for unicode, I put some thoughts here:

        https://www.reddit.com/r/oilshell/comments/67mmcx/ovm_will_be_a_slice_of_the_cpython_vm/dgsel1b/

        I think I can follow what all shells do, and that is use char* all over the place, and delegate unicode string manipulation for ${#s} and so forth to libc. (That has to count characters and not bytes.)

        The vast majority of the time you don’t need to encode/decode. That is the implicit assumption in Python 3 programming. The parsing libraries return unicode strings (arrays of code points) rather than bytes.

        So if you use libc’s unicode, then you don’t necessarily need Python’s unicode handling. That would probably save a lot of code size, because there are some big unicode tables inside the Python binary.

        If I use libc, then it might not be UTF-8 centric… like I said I’m sort of thinking aloud here. It might change by the time I sit down and implement it.

      2. 1

        I admire the regular blog posts, keep up the good work.