1. 9
  1.  

  2. 5

    Hi Andy,

    We chatted briefly on Reddit a few months back, if you manage to recall I was interested in a bash-like implementation as a possible base for hacking on weird and wonderful new shell functionality. I’ve finally published the project I had in mind for use with Oil: http://github.com/dw/mitogen/

    As for a carrot, there is an entire rabbit warren of experimental fun that could be had with a shell that knew how to natively run parts of itself on multiple machines simultaneously. How about multi-host/multi-uid shell commands, pipelines and job control?

    1. 3

      Very nice! I’ve gotten a lot of feedback but it is nice to get it in the form of something executable :)

      I looked at your examples folder, so I guess you are saying that you would like a more shell-like syntax on top of these Python libraries?

      https://github.com/dw/mitogen/tree/master/examples

      I think that’s a good idea. I’d be interested in a sketch of what you have in mind (maybe check in a text file to the repo?) One thing I realized about distributed execution is that you often want a graph of processes, not just pipelines. (Most big data frameworks treat a program as a graph.) Shell can sort of handle that with its syntax, but there might be a better way to do it. I have a few ideas but I’m interested in other ideas.

      I definitely want to have a distributed shell functionality. I mentioned that a little here [1] and I elaborated on it in this thread [2]. The problem I see with bare Docker or ssh is that you don’t have any notion of scheduling? So you’re limited to a handful of hosts, and you can’t utilize them very effectively.

      Clusters have to be shared between multiple users and shells; it doesn’t make sense to have an ssh-able or Docker-able cluster sitting around for one user! So if I had to start now, I might built it around Kubernetes, which is a new cluster scheduler based on Google’s internal Borg system.

      The rationale I have for a distributed shell is:

      • Distributed systems are a collection of Unix processes, running on different machines, communicating through ports
      • Shell is a language for starting Unix processes and specifying their connections (pipes, files, etc.)
      • Therefore, shell is a language for specifying the architecture of a distributed system

      [1] http://www.oilshell.org/blog/2017/01/19.html [2] https://www.reddit.com/r/oilshell/comments/5x8cb0/the_dwm_aesthetic/

      BTW I started a section here, but it’s far from complete:

      https://github.com/oilshell/oil/wiki/ExternalResources#distributed-shells

      Andy

    2. 2

      “OSH is an interpreter written in Python — that is, it’s doubly-interpreted. I should probably fix that before writing Oil.”

      No kidding. Your original plan of C++ sounds better since it will at least be efficient. C++ isn’t fun to develop in, though. Your resume also indicates experience in C, Java, and Ocaml. These are all relevant where C and Java in similar camp as C++ but Ocaml is somewhat hybrid since development can be pleasant with decent performance. I think the choice of how to do this implementation depends on whether you’re going for a lot of contributors (need popular language), just performance (need fast language), or personal pleasure writing it (need your language). Whatever goals will dictate best language of those you know for the task.

      Temporarily, I’d say implement it in one of the low-level languages (esp C or C++ w/ safe constructs). That’s the final implementation. For development, you can use a trick I came up with where you map a subset of a language such as Python (you’re already using it) to an equivalent subset for something fast such as C/C++ (you mentioned C++). You then only use that subset of Python to rapidly develop and test your project. Either at the end or periodically for stabilized components, you port the Python code to C++. So, you get the fun and fast development of Python with the efficiency of C/C++ with the cost being a tedious port which might not be that bad. Especially if you end up with some Python-ish libraries for C/C++ that can make other ports smoother.

      1. 3

        It would be far from hard to beat the evaluation performance of bash using a Python implementation, bash really is that bad. I don’t think the primary concern people have with a shell is in how efficient it is.. it spends 99% of its life asleep. Meanwhile systems languages introduce a huge penalty for iterating, and Oil very much seems to be in the iteration stage

        1. 0

          Oh I agree. Double interpreted is just usually a no no. So, I just tossed out a solution giving some of the benefits with an optional path to high performance or enhanced security.

        2. 2

          Well, the argument for doing Oil first is that it’s better to design a new language in a high-level language, not C. Python/Ruby/Perl/JS are notorious for implementation-defined semantics. I think Python is the best-designed out of the bunch, but even it has a lot of weirdness:

          https://github.com/satwikkansal/wtfpython

          You can also ask the PyPy developers how much of CPython they had to copy in the name of compatibility. Here is a good talk about these corners of Python. A great example is how nontrivial “a + b” is in Python:

          https://www.youtube.com/watch?v=IeSu_odkI5I

          Much of the hairiness of “a + b” was done in the name of performance.

          I am taking a different approach and writing an abstract interpreter first, then trying to make it fast, rather than making a fast interpreter and not being sure what it does.

          And yes OPy [1] is meant to help with this – it could be translated to C++, compiled to bytecode, etc. The devil is in the details though… once you actually start doing it, the differences between all these seemingly viable approaches come out.

          [1] http://www.oilshell.org/blog/tags.html?tag=opy#opy

          1. 2

            Also I would be careful about saying “a trick you came up with”. That’s not a new technique – Knuth implemented TeX in an abstract subset of Pascal, which is translated to C on all modern systems. That is what people actually use right now.

            https://tex.stackexchange.com/questions/95369/what-language-is-tex-implemented-in

            Actually it’s a subset in one sense and a superset in another, because WEB adds the literate programming concepts.

            That’s how I think of the Python/OPy implementation of OSH/Oil. It’s an abstract subset / superset of a real language. I don’t use many Python quirks like decorators, static/class methods, or even unicode objects (preferring utf-8 strings), which should ease translation. At first I was avoiding exceptions too, but I decided I need those.

            1. 1

              The stuff in your other comment makes sense. Good luck on it. Far as my trick, I invented it on my own trying to balance the RAD benefits of a good BASIC versus portability & speed benefits of C source. I connected them in way I recommended but also turned it into a hacked together 4GL. I later read Per Brinch Hansen used similar trick exploring with ALGOL but hand-converting it to assembly. And now Knuth with TeX.

              This is independent invention at work. It happens a lot with me in many orderings. Now, the question is, do I never take credit for my inventions since others might have invented it? Or do exhaustive search that might rival the effort to invent in itself? Or continue taking credit for my work with credit given to others who pulled off same or better? Ive been doing the latter (3) so far. Also, I doubt No 1 or 2 would profit me reputationally although No 2 is our patent system which Id gain financially on.

          2. 2

            All the best! Your project always struck me as insanely ambitious in terms of the state space of possible inputs it was taking on. It’s fascinating to see all the bobbing and weaving as you learn about new issues and constraints.

            1. 2

              Thanks. On the one hand, replacing something from the 80’s shouldn’t be ambitious. I thought software was supported to move fast! It’s super weird that bash and make are as popular as ever (there seem to be weekly discussions about them on Lobsters/HN/Reddit.)

              On the other hand, there’s an unbroken chain back to sh in the early 1970’s. I guess it’s ambitious in the sense that I want to break that chain.

              And that isn’t even the ambitious part – the distributed shell stuff I mentioned elsewhere in the thread should be more ambitious. But I haven’t talked about it that much because it’s still vaporware.


              BTW I believe with Alpine Linux it’s feasible for me to create working/usable distro with two languages: Oil and C. I don’t think I could do it either from scratch or starting from Debian.

              Alpine uses busybox ash everywhere, and it also uses a very restricted dialect of Make. It uses Lua in a few places (maybe only a few hundred lines). But there’s no Python or Perl.

              Right now there are several languages you could write a Lisp in in a typical distro: C, C++, sh, awk, make, m4 for autoconf, cmake for the LLVM build system, and sometimes Perl/Python. I guess I won’t be able to get rid of the autoconf/cmake in general, but doing on a basic system might be possible. It’s sort of turning Alpine Linux (a real system) into Aboriginal Linux (an education system, now defunct.)

              1. 2

                Ooh, interesting! Thanks for remembering me :)

                It wasn’t at all surprising to me that this is ambitious, considering the amount of stuff out there that runs on these tools. Compatibility is great, compatibility is a bitch.

                And I actually think what’s creative about what you’re doing is that you’re trying not to break the chain. Unlike say my Mu project, you want to give the entire installed base of Bash an upgrade path.

                Have you found any cases where upgrading a Bash script would require knowledge that’s not in the script itself? Like it’s doing something fundamentally unsafe, and you can use a safer way, but knowing the two are equivalent is likely to be AI-complete? I was thinking about this risk during your discussions of the right way to safely escape variables.

                1. 2

                  Yes that’s true, the way I think of it is as “declaring syntax bankruptcy on shell” but not breaking the chain. Bash is stuffed full of bad syntax and there’s no room to add anything else.

                  I think those are two slightly different questions. There is always the trivial conversion:

                  evalsh $entire_shell_program
                  

                  So I think everything you need to convert must be in the script. On the other hand there are certainly things in bash I won’t be able to understand. That is the difference between OSH and bash – OSH is the subset of bash that can be converted.

                  The main thing I have found is that understanding certain bash scripts is mathematically impossible, i.e. it would be equivalent to solving the halting problem:

                  “Parsing Bash is Undecidable” – http://www.oilshell.org/blog/2016/10/20.html

                  I think it’s out of scope to change the meaning of the program and fix unsafe programs though. If you have an escaping bug, the OSH-to-Oil converter will preserve it. It’s possible it could help you with specific cases, but I think there will be so many other things to do that I won’t get to that… If I can have say 90% to 99% of sh/bash scripts auto-converted and running with no changes, I’ll be happy!

                  Alpine’s APKBUILD shell scripts are a great use case for that. Nobody is ever going to rewrite the scripts for 5K or 10K packages manually! It has to be automatic. Likewise for Arch and PKGBUILD, etc. So an ambitious but I think reachable goal is to convert an entire distro, rebuild the whole thing, and run it! So I’d be testing build time and runtime. Probably less than 100 packages would need to work to start with.

            2. 2

              Are you aware of the name conflict over osh binary being provided by OMake build system?

              etsh was also know as osh a few month ago and change his name for the same reason

              1. 1

                Hm thanks for letting me know… I will take care not to use the same config file names.

                I’m not sure about the binary name. I’m surprised that the etsh author changed it, because according to the OMake home page, there hasn’t been a release since 2010? I have actually read the OMake paper, but I wasn’t aware that it’s installed on many systems (and I didn’t know it had an osh binary).

                1. 3

                  Debian will first-come-first-served you on the binary name (the same thing happened to node.js, to much wailing and gnashing of teeth, though to be fair to you, “osh” is far less obviously generic than “node”).