1. 16
    1. 5

      Contrary to the title, it’s more than POSIX:

      A shell parser, formatter, and interpreter. Supports POSIX Shell, Bash, and mksh.

      I would say bash is at least 2x the size of POSIX. There is

      • [[, (( for boolean logic and arithmetic (different than posix $(()
      • a bunch of stuff in ${} like ${x//pat/replace}
      • some rare redirect operators like &>, etc.
      • arrays, associative arrays
      • time (which is not a builtin)
      • a whole bunch of builtins, new flags to existing builtins

      I don’t know of much in mksh that’s not in bash. There’s probably a little. bash seems to have subsumed most ksh features. A lot of “bash-isms” are really “ksh-isms”.

      This project as well as mrsh and others are linked here: https://github.com/oilshell/oil/wiki/ExternalResources (I should probably rename this wiki page at some point)

      1. 2

        Big difference though, is Go programs have native Windows support, Oil doesnt:

        https://github.com/oilshell/oil/issues/122

        1. 2

          How is that relevant? I’m saying the title is a bit misleading. mvdan/sh supports much more than POSIX, at least on the parsing side.

          edit: It looks like gosh is a separate shell project which is not expected to be stable.

          So yeah the title is misleading on a couple counts. It supports more parsing, probably less runtime.

      2. 2

        It’s unclear from your quote if it supports bash or Bourne sh - but if bash, dont forget network redirects:

        /dev/tcp/host/port

        If host is a valid hostname or Internet address, and port is an integer port number or service name, Bash attempts to open the corresponding TCP socket.

        /dev/udp/host/port

        If host is a valid hostname or Internet address, and port is an integer port number or service name, Bash attempts to open the corresponding UDP socket.

        https://www.gnu.org/software/bash/manual/html_node/Redirections.html#Redirections

    2. 5

      To build on what @andyc and @ddevault said: the main purpose of the project is to expose libraries in Go, and to build tooling with them. The prime example of this is shfmt, which formats shell code.

      The module includes an interpreter, but like others have said in this thread, full POSIX compatibility is near impossible. It’s essentially best-effort, for the purpose of being able to interpret 99% of shell code out in the wild with pure Go. This can give the developer tighter control on how arbitrary code is executed, or avoid hoops such as cgo or exec with external dependencies.

      And of course, if POSIX compatibility is a priority, a well established shell with that goal should be used instead. I assume this is where mrsh would be a better fit.

    3. 3

      See also @emersion’s mrsh:

      https://mrsh.sh

      A strict POSIX shell written in POSIX C99, which ships with a library which can be used for parsing & interpreting shell scripts, or writing your own interactive shells, from C.

      1. 2

        I thought mine was interesting because I havent seen a POSIX shell in Go before, and from my experience Go projects have had better Windows support (with exception of color output). Contrast with C, where Windows support by default doesnt exist unless the author explicitly codes it in.

        1. 12

          BTW you can’t implement a POSIX shell in portable Go, because Go doesn’t export fork() and exec(). It only exposes ForkExec() portably. Windows and Unix have very different process models, and the Unix shell uses a LOT of stuff that doesn’t make sense in Windows (in contrast to say the file system which I think is more or less a 1:1 translation.)

          There are of course Windows ports of Unix shells but I think they must be done at a “higher level” than libc or Go’s abstractions. They’re also pretty slow.

          https://golang.org/pkg/syscall/

          In a shell you have to fork and re-run the interpreter, without exec. Consider:

          f() { echo one; echo two; }
          g() { while read x; do echo [$x]; done; }
          
          f | g
          

          (there are much simpler examples but this illustrates the principle)

          I’m not even sure how it’s done with non-portable extensions, given the runtime.


          The issue is that the Go runtime uses threads to implement goroutines. Threads and fork don’t mix. Especially when the go runtime uses a lot of concurrency primitives. Generally Go programs only do the simple ForkExec, never the more complex patterns that a shell requires – i.e. just setting up a pipeline requires a bunch of stuff between fork() and exec().

          There are probably better references, but here a couple:

          https://rachelbythebay.com/w/2011/06/07/forked/

          https://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them

          And fork, threads, and signals – which a POSIX shell requires – are its own special brand of nightmare. All existing shells use no threads – just fork and signals (and file descriptor manipulation).


          Maybe you can set GOMAXPROCS=1 and the Go runtime will only use one thread? And then you can make the Linux syscalls for fork and exec. But I’m not sure if BSDs have stable syscalls for fork and exec – there was another thread here about OpenBSD validating libc calls which is very very relevant to writing a shell in Go.

          It’s not clear to me if that’s the case, or if shfmt uses it. I grepped the repo for GOMAXPROCS and didn’t find anything.

          https://stackoverflow.com/questions/37391009/is-lock-necessary-when-gomaxprocs-is-1

          1. 3

            The relevant link:

            https://lobste.rs/s/6a6zne/some_reasons_for_go_not_make_system_calls

            tl;dr I’m not sure how you can make a portable POSIX shell in Go. I have been wondering about this for a few years actually, because there are several alternative shells written in Go. However most of them don’t purport to support the f | g pattern above, but a POSIX shell must.

          2. 1

            You can emulate subshells by virtualize the process environment (working directory, environment variables, etc.). There are obviously things you cannot virtualize like PID, but you can get pretty far in terms of compatibility with existing code that relies on subshells.

            1. 2

              I think the biggest thing is the file descriptor table. You can maybe hack around f | g above somehow, but how about this:

              f() { echo one; ls /; echo two; }
              
              f | g | g | g
              

              Well I guess you can implement pipes in user space, which is what I guess Elvish does, which makes sense since it has structured data anyway.

              I’m more talking about a POSIX- / bash- compatible shell in Go. I don’t see how you do it, but I never tried.

              Also I guess it should be very simple to try f | g in the shfmt interpreter (though I don’t keep a Go toolchain up to date.)

              I think it can be made it work on x86 Linux with some one-offs but it would not port to BSDs or Solaris very well. You would have to do work for each platform, whereas with C everything is portable.

              1. 1

                It is much less tricky than you think; Go’s os.StartProcess gives you full control on what the file descriptor table of the child process should look like.

                To simplify things, let’s start with a pipe consisting entirely of external commands, like:

                cat a.txt | grep a

                To execute this pipeline, you make a pipe, tell StartProcess to use the write end as FD 1 of the cat process, and use the read end as FD 0 of the grep process. Still kernel-space pipes.

                Now to bring builtin functions into the picture, what you need to do is to maintain some user-space FD tables, and program the builtin functions so that they never use the “real” FD table, only your user-space ones. So if we suppose echo is a builtin command, then in

                echo aaa | grep a

                You still make a pipe and the handling of grep is unchanged; but for echo, you make a “virtual” FD table where FD 1 is the write end of the pipe, and make your implementation of echo to use that virtual FD table.

                The same technique can be used for environment variables and the working directory, although I have only done this for FD tables in Elvish.

                Some notes on Elvish’s implementation:

                1. 2

                  How does it preserve state? The function being put in the pipeline can make use of the whole state of the shell.

                  FOO=foo
                  BAR=bar
                  
                  g() { while read x; do echo $FOO $x $BAR; done; }
                  
                  f | g | g
                  

                  As far as I understand, StartProcess can only start a new sh image. It doesn’t fork the existing image. That’s what I assume based on taking argv. If it forked then it wouldn’t need to take argv. And that wouldn’t work with Go’s runtime (as long as there are multiple threads in Go’s runtime, which I thought you could disable, but I’m not sure.)


                  To summarize, I’m not saying you can’t set up a shell pipeline in portable Go. I’m saying you can’t put a shell function in a pipeline transparently in portable Go. Because you have to fork() and re-run the interpreter, without exec().

                  (FWIW this is not academic since I use this all the time, e.g. in my benchmarks/ dir, and I wrote a couple blog posts about it. It’s a powerful form of composition in shell.)

                  1. 2

                    I put it here to make sure the example is runnable:

                    https://github.com/oilshell/blog-code/blob/master/function-in-pipeline/demo.sh

                    I guess since we got this far in the conversation I will try to run it with the shfmt interpreter, if I can get it to build.

                    It may work on one platform if there is a way to disable threads in the Go runtime. And then it can use non-portable syscalls. However I will be surprised if it works on multiple platforms without non-portable code (i.e. I claim that’s impossible.)

                    ~/git/oilshell/blog-code/function-in-pipeline$ ./demo.sh 
                    [ [ BEGIN ] ]
                    [ [ alpine ] ]
                    [ [ bin ] ]
                    [ [ boot ] ]
                    [ [ END ] ]
                    

                    edit: I think the more likely way to do it is to not use os.StartProcess, but to manually copy the entire interpreter state (so the pipeline element can’t modify the parent shell state, and to avoid races), and communicate over a file descriptor connected to two threads/goroutines in the same process. But I don’t think anybody is doing that.

                    I think there may be some stuff in addition to the $PID that would make such an implementation non-POSIX though. You might be able to get pretty close.

                  2. 1

                    Maybe something isn’t clear: to implement a pipeline in a threaded runtime environment (like that of Go), you don’t fork a new process for each component of the pipeline, you just start a new thread [1]. Each of those threads has their own user-space FD table and variable table. When you do have to launch an external command, you let the external command inherit the FD table and the exported part of the variable table from the thread, not the process.

                    [1] In Go you actually start a goroutine, but that’s irrelevant implementation detail.

                    1. 1

                      Right, that’s what I was thinking with the last edit. But to get POSIX semantics you will have to copy the interpreter state – at least the globals and the stack.

                      I tried with Elvish and it works nicely, but the semantics aren’t POSIX because f | sort will increment a.

                      fn f { echo ---; ls / | head; echo ---; a = (+ $a 1) }
                      

                      Of course the claim isn’t about anything in Elvish – it’s about writing a POSIX shell in portable Go.

                      Though in Elvish it also seems like you need some kind of GIL to protect variables like a and compound data structures from races.

                      1. 1

                        You are right that Elvish does not copy the globals - Elvish’s semantics is different. But it seems we are in agreement that it is quite straightforward to copy the global states if you want, and get pretty far in emulating the semantics of subshells in portable Go code.

                        1. 1

                          I wouldn’t say it’s straightforward – as mentioned there is other state besides the FD table, shell globals, and the shell stack. I’ve spent a lot of time running shell scripts in the wild so I know how they depend on every nook and cranny of the shell.

                          This paper[1] is a great summary of all the difficulties with fork() from the other side – the kernel rather than the shell. And they claim that shell is the ONLY application that fork() is good for! (I’ve brainstormed making a fork()-less shell with many of these techniques to alleviate the kernel’s burden, but it’s not high priority. )

                          [1] https://lobste.rs/s/nfxsou/fork_road


                          I like a lot of things about Go, and I considered it for writing Oil (along with hand-written C++, Lua, femtolisp, and many others).

                          The string slicing is great and something I’m kind of doing by hand now. The GC would have saved me a significant amount of time too.

                          But I consider the runtime to be a dealbreaker for writing a POSIX shell, especially one that can be used as the default system shell. If you have control over the semantics like Elvish, then it’s a somewhat different story.

                          I’ve been working on a compatible shell for a long time so I have more of an appreciation for how constrained a problem it is. There are not many languages besides C or C++ that are appropriate for the task. Rust is probably the #2 contender if you’re not resorting to code gen like Oil.

                          I’ll make a concrete prediction: no Linux distro or major Unix system will ever use shell written in Go as their default system shell.

                          Of course that’s probably not a goal of many projects, so writing in Go may be a fine choice. But it’s a goal of Oil to be usable as the default system shell, i.e. to replace bash.


                          Also, even if it’s not hard to program, I would much rather let the kernel copy the data in pages than recursively walk and copy data structures to set up a pipeline.

                          And of course it’s not a copy, the virtual memory manager enables COW. Shells don’t generally have large heaps, but the goal of Oil is to expand shell usage and maybe some programs will have large heaps.

                          I also think the issue with mutable variables is not a small one, from a design perspective. It’s a big issue in the semantics of a shell language. The easiest solution is the GIL – all the other ones are hard or impose some burden on the user.

                          I think the persistent data structures are an interesting angle on it, but I guess it doesn’t completely eliminate the problem.


                          Anyway thanks for the info about the virtual FD table. That did answer a bunch of my questions!

                          1. 2

                            Of course that’s probably not a goal of many projects, so writing in Go may be a fine choice. But it’s a goal of Oil to be usable as the default system shell, i.e. to replace bash.

                            Depending on your definition of “default system shell”. I agree that with Go it’s very hard (if not impossible) to implement a shell with a semantics close to POSIX enough to be the default /bin/sh; that is, the default interpreter for all these shell scripts out there. But you can totally use a shell implemented in Go (like Elvish!) as your personal default shell and use it to write new scripts.

                            You will still run into POSIX shell stuff every now and then, either snippets on the web, or things like venv‘s activation script and output of ssh-agent -s. I’d like to revisit POSIX emulation in Elvish some day; but as far as these use cases go, they almost never involve advanced semantics that you cannot emulate in Go.

                            I also have a feeling that the “system shell” in becoming less important in the current trend of computing, at least for “ordinary” users and system administrators. As a user I used to write some initscripts; but now I use systemd and just write INI-style unit files. As a system administrator I used to write scripts to glue my services together, but today I configure containers. Of course you still need to sprinkle some shell, telling systemd and container which commands to invoke, but these are almost always very “dumb” code that is just a sequence of command invocations, with some occasional variable references.

                            And of course it’s not a copy, the virtual memory manager enables COW. Shells don’t generally have large heaps, but the goal of Oil is to expand shell usage and maybe some programs will have large heaps.

                            Persistent data structure solves the “data forking” problem pretty well - there is zero cost in copying, and near constant cost in modifying “forked” data. As long as you can live with its overhead, of course.

                            I also think the issue with mutable variables is not a small one, from a design perspective. It’s a big issue in the semantics of a shell language. The easiest solution is the GIL – all the other ones are hard or impose some burden on the user.

                            I haven’t thought enough about what is the “best” concurrency semantics for a shell language, so I am deferring the decision to future, when I have some experience programming something concurrent in Elvish.

        2. 2

          mvdan/sh is also very cool :)

    4. 1

      Very cool. I’ve always wanted to do this. Especially since the original sh was written with lots of preprocessing to resemble ALGOL instead of C.