1. 26
  1.  

    1. 6

      By the way, the last major issue that will get us to 100% native code is to replace our JSON library. This will completely break our dependence on CPython (for YSH – it’s already done for OSH). In other words, the 77 test spec-cpp difference mentioned should go down to approximately zero.

      JSON and UTF-8 is a big (and arguably fun :-) ) subproject that we can use help with, and it should end up as something like ~1000 lines of very high-level, spec-driven code [1]

      So if you have the interest and time to dive pretty deep into both JSON and UTF-8, let me know! We’re in between grants, but you can be paid. We’ve paid a total of 100K euros to contributors in the last ~15 months.


      Specifically, we want to:

      • Remove our use of the yajl JSON library
      • Replace it with our own fancy parser and fancy printer, written from scratch in typed Python
        • We’re addressing the “JSON-Unix Mismatch”, which I discussed in recent posts about our design: How to Create a UTF-16 Surrogate Pair by Hand, with Python
        • The mismatch is that Unix APIs return arbitrary bytes, while JSON can represent all valid Unicode strings, plus an assortment of invalid strings due to its Windows/UTF-16 legacy

      I plan to use this test suite, with 300 test cases very similar in spirit to our own spec tests:

      In other words, we’re treating the data languages just like the shell languages.


      Why write it in typed Python?

      • JSON/J8 Notation is inherently coupled to the interpreter data structures, i.e. our value_t, which is garbage collected. The yajl library has a similar binding to CPython’s data structures.
      • With our mycpp tool, typed Python gets us performance in the realm of Java/OCaml. The main issue is not allocating intermediate string objects – and there are straightforward ways to do that in Python, with the help of our runtime libraries.

      Other links of interest:

      Let me know if you want to help!

      [1] OSH itself is still only ~21K significant lines of code, YSH brings it to ~25K probably

    2. 2

      compexport is a fantastic idea :)

    3. 1

      Is the is-main builtin available when running in POSIX sh mode?

      1. 1

        Yes, it works in bin/osh !

        1. 2

          For is-main and other extensions to POSIX sh, does it require some explicit opt-in or are they always available? My initial reaction is that all extensions should be opt-in but on second thought, POSIX doesn’t dictate what shell commands should not be available so scripts should already be ready to deal with extension commands.

          1. 1

            The clutter in the “first word namespace” sort of bothers me, but we do what all shells do – we add useful functionality to that namespace. It’s possible to add options, but I try to keep the number of global options down, when they don’t seem worth their weight.

            For example, bash 4 added readarray and mapfile and stuff like that. You just hope those don’t collide with existing names (and they generally don’t). And is-main is the same way.

            I want to add some features to manage that namespace, e.g. https://github.com/oilshell/oil/issues/588

            But it’s not clear how they will go. Ideas welcome :)


            BTW if people really want a POSIX shell, yash is a very high quality shell focused on POSIX compatibility - https://magicant.github.io/yash/

            But I think many people who say “POSIX” mean “I want a language I can understand, without all the bash-isms”. That is, I think they cling to POSIX because it’s “more sane” than bash.

            Well 2 things are interesting about Oils

            • You have a second implementation of bash, so it actually makes bash more “solid”. It’s like having 2 implementations of JavaScript. Although I won’t blame people who say it’s still very ugly.
            • YSH is designed to be the clean slate language that’s understandable and fits in your head. So I hope that when people start using YSH, they will feel less need to cling to POSIX. IMO POSIX is a little backward looking

            I’ve been reading a lot of big misconceptions about POSIX, like here - https://lobste.rs/s/b8icdy/dt_duck_tape_for_your_unix_pipes

            The misconception there being - “we can’t use this command until POSIX tells us it’s OK” :-) That’s not what you were saying, but I am wary of putting too much authority in POSIX, because it’s not that well maintained

            On the other hand, if they really DO want POSIX, then yash is high quality and has existed for ~14 years. (The similar names are somewhat unfortunate and unintentional, but I say YSH as Why-Ess-Aych, so it doesn’t sound similar to “yash” to me.)


            If they simply mean “I want something that runs my existing shell scripts”, then OSH does that too! (But it also runs bash scripts, whereas yash doesn’t aim for bash compatibility)

            1. 1

              I don’t disagree with any of the substantive points in your reply. As an anecdote, I exclusively write shell scripts for POSIX sh for portability between Linux and the BSDs, not necessarily because I dislike bash-isms as a matter of taste.

              1. 2

                As usual, after writing a long response, I thought of a shorter way to say what I wanted

                • I agree with the idea of coding to a spec NOT an implementation, and shell on Linux+BSD is an example of that. It is annoying when code gets extremely tied to Linux/bash
                • This is one reason why Oils has TWO implementations – one on CPython, and one in native C++. We are targetting a subset of those two platforms, which cleans up the implementation a lot
                • An interesting part about Oils is that bash is now a pretty well specified target – you have bash+OSH to sort out behavior differences

                So I agree with the idea in general, just not quite POSIX specfically …

                And also of course it’s often practical to target 2 platforms, though it’s extra work that many people don’t do.

                1. 1

                  The shell scripts I maintain that must be portable are bootstrap scripts, so they would have to kick off a build of a better command interpreter as part of that process if I wanted advanced features. In my case it’s easier to just use POSIX shell. It’s a matter of being practical, not something I necessarily do by choice. Once the system is bootstrapped enough, I jump into Python.

                  For Oils, the POSIX sh + Python dichotomy might be difficult to break into in the preceding case. I don’t think you’re competing with POSIX sh as much as you’re competing with Python. Maybe your niche is as a better bash in trendier or minimal Linux distros.

              2. 1

                Ah OK, that makes sense, but if the philosophy is to be portable to the system shells on Linux/BSDs, then Oils won’t help there either :-/

                Larry Wall has this saying “it’s easier to port a shell than a shell script”, i.e. easier to compile bash or yash or OSH for BSD, than to code for 2 shells simultaneously (and we’re intentionally making that easy – our build deps are a C++ compiler and shell, no make/ninja)

                But that’s just one way of using it. For people interested in strict POSIX, we have osh -n $myscript which exports the syntax tree. It’s not in a stable format, but it can be

                We want to use that to write translators and formatters, but you can also use it to write a POSIX checker/linter

                I guess the bottom line is that there are dozens of ways to use shell, and we are agnostic about how people use them :) In general the goal is to move on from POSIX, while remaining compatible. But it’s meant to be a foundational tool that people can use it any way they want!

                andy@hoover:~/git/oilshell/oil$ osh -n configure |head -n 30
                (command.CommandList
                  children: [
                    (command.ShAssignment
                      left: 
                        (Token
                          id: Id.Lit_VarLike
                          col: 0
                          length: 4
                          span_id: 40
                          line: 
                            (SourceLine
                              line_num: 15
                              content: 
                'TMP=${TMPDIR:-/tmp}  # Assume that any system has $TMPDIR set or /tmp exists\n'
                              src: (source.MainFile path:configure)
                            )
                          tval: 'TMP='
                        )
                      pairs: [
                        (AssignPair
                          left: 
                            (Token
                              id: Id.Lit_VarLike
                              col: 0
                              length: 4