1. 29
  1.  

  2. 5

    So that ~ works by patching the ast; not entirely sure how I feel about that.

    There’s also no support for escaping anything in the shell command. Not sure how I feel about that either.

    It’s also a bit of a missed opportunity IMO, as right now it’s just a syntax wrapper but you can do so much more. For example, shell commands typically work on lines, instead of just returning a string, returning a [] (with a generator for bonus points) would be better, or even a subclass of list with some added methods for convenience, so you can do stuff like ~"git log".grep('foo'). This also means it won’t have to use the shell and all the problems that introduces, since you can start the git log process with the stdout connected to a new grep process’s stdin.

    When I wrote my web-based music player years ago this is how converting audio files worked: the flac -d would decompress to stdout, which was connected to oggenc, and the stdout of that got streamed to Firefox. This meant you didn’t have to wait for the conversion to finish to start playing, and even my little Atom server was more than fast enough to convert this on the fly.

    1. 2

      Hey! Great ideas – all of this can totally be added. Make a feature request on github :P

      Also btw, escaping arguments is also possible by patching f strings inside the bash syntax strings. ASTs are pretty powerful.

      1. 2

        If you don’t use a shell then you don’t need to escape anything, and also can’t be accidentally bypassed if you do "git " + cmd instead of "git ${cmd}". Besides, escaping stuff in shell is notoriously hard, not in the least because what the “shell” is exactly may be different on different systems.

    2. 3

      There are libraries/DSLs plumbum, mario, and sh in Python listed here – does zxpy differ significantly from them?

      https://github.com/oilshell/oil/wiki/Internal-DSLs-for-Shell

      And this page is editable – feel free to edit it (with the node.js zx as well).

      1. 3

        This is a timely post. I’ve just gotten fed up with how bad it feels to convert a shell script to Python. This time, I wanted argument parsing.

        So, I’m finally going through and looking at ‘sh’, marcel, xonsh, plumbum, and now zx to find something better in between bash and vanilla Python.

        1. 3

          Definitely write up the results! I think many would find it useful.

          It feels like there’s a bit too much re-inventing the wheel here, even just within Python. But I could be wrong.

          FWIW I have written similar shell-ish Python tools going back 15 years now – one was called “dice” and used JSON over pipes. I even got some positive comments from Guido van Rossum about it. But I think that approach is fundamentally limited – hence the long-winded Oil project :)

          It’s probably useful for some problems, but I would still point to the lack of convergence as a curious thing. To me it feels like each one is a little wrong for some job, so someone writes a new one.

          I would generally like my Python-based tools, but then when I go back to use them, it was often easier to just do it in shell. (Deployment was an issue for sure.) I learned enough shell by writing them that I lacked the motivation to actually use them for “production” problems :)

          Similarly there was a predecessor to Eggex in Python called Annex. But when I went back to use it, it was easier to just suck it up and use Python/Perl regex syntax. Again ironically I learned every nook and cranny of Python and POSIX regex syntax by writing it. I think Eggex makes more sense because it’s embedded in a shell language and it’s not a Python library.


          Also clicking through the wiki, I just noticed this from the author of “pysh”:

          I no longer believe this approach to shell scripting to be a good solution. pysh’s approach is to modify the syntax of python resulting in an uglier, and confusing, language. Maybe someday I’ll stumble upon the ``right way’’ to implement a shell language, but for now bash is just fine.

          (But I don’t agree bash is fine :) )

          1. 2

            Here are my thoughts from my hacking last night:

            Marcel and Xonsh are doing way more than I want. I mostly just want a library that makes shelling out easier. For example, Marcel is going to return Python data types for things when I really just want shell scripting, but easier from Python. I’d still like to give them a closer look, but last night I primarily compared plumbum and sh. I’m (sorry OP) not really interested in zxpy because of its interpolation syntax.

            Re: plumbum and sh, sh is the clear winner. I want behavior like bash’s set -x and sh provides that with info-level logging. I want my program to run “like a shell script” and print all stderr and stdout to the terminal, which was easier to accomplish in sh than in plumbum (though it does add a bit of boilerplate). Finally, the way sh does subcommands is great, in that it really makes the shell you’re running “look like” Python code. Here’s an example of how good sh can look and where it falls short from a comment on a ticket I made last night.

            1. 1

              Thanks for the feedback, I re-organized the page and added a link to your comments!

              https://github.com/oilshell/oil/wiki/Internal-DSLs-for-Shell

              I also appreciate any feedback on https://www.oilshell.org/ itself; it’s basically shell with Python-like data types, and without quoting problems, so I imagine it may be appropriate for what you’re doing.

              1. 1

                Hey, been thinking about your request-for-comment on Oil and didn’t want to ignore you. Here’s where I come out personally:

                1. Zsh is a local optimum for posix/bash-compatible shells. I have mine well-configured, and with a few plugins like autosuggestions and syntax highlighting (recently installed fzf-tab which is awesome too) it seems to me that it offers anything Fish does but is still posix-compatible (there’s a thread today about Fish on HN so I’m thinking about it). Side note: I remain completely baffled why people use plugin managers for shells. My .zshrc is like 100 lines (which is mostly setopts and bindkeys) and I’ve never seen the need for it.

                2. shell is awful for scripting. For anything more than running some commands with some if statements I’d use Python, which is itself a local-optimum for dynamic languages. So, personally, any avenue to “better scripting” will be through improving Python’s ability to be used for shell scripting. Not that it’s “bad” now, it could just be nicer to transition from shell to Python without so much impedance mismatch.

                3. So, given the above, I don’t see where Oil fits for me personally. If I ever switch from Zsh for my interactive shell it’ll be to something more radically different like Nu shell (which seems very promising). And like I said for scripting, there’s no reason to leave Python, where I can whip out some Pandas, Requests, etc.

                Btw, I’m unhappy with Plumbum and ‘sh’ for a few reasons so I started my own autoshell library over the weekend. Currently working on piping using pipe operators. I’m gonna try out the new async-based subprocess for my library to be able to real-time tee output to the console, and so on.

                1. 2

                  Thanks for the feedback! Your points make sense and are not too surprising to me.

                  1. Oil isn’t a better interactive shell than zsh at the moment; however there is a new “headless mode” coming up which I’m excited about. That will enable some more inventive UIs.

                  2. Yes shell is awful on the surface, but it has a great core! And the point of Oil is to fix it while retaining the good parts :) I guess people aren’t convinced it is possible to rehabilitate, or are not convinced that there are good parts.

                  3. I use Python in all my shell scripts! I address this here:

                  http://www.oilshell.org/blog/2021/01/why-a-new-shell.html#shouldnt-scripts-over-100-lines-be-rewritten-in-python-or-ruby

                  However I’ve noticed that the way this is worded isn’t particularly convincing, so I plan to update it: https://github.com/oilshell/oil/issues/944

                  The tl;dr meme is that it’s “better” to write 200 lines of shell that calls 300 lines of Python, than to write 1000 or 2000 lines of Python. But I understand that a lot of people haven’t felt that “compression”. It’s one of those things that you have to experience yourself.

                  It’s hard to explain but some things just naturally go in Python and some things naturally go in shell, and they work together as part of the same system.

                  I would make an analogy to writing many manual loops over dicts and lists in Python, and then discovering SQL or Data Frames. You will just save so much repetitive code. (Not that SQL doesn’t have a ton of downsides too.)

                  Python is a great language, and my primary one for ~18 years, and Oil is written in it, but it isn’t optimal for many tasks. For instance, one of the main reasons I use shell is to parallelize Python (and R) trivially!

                  If you have any other feedback or questions let me know.

                  1. 2

                    To your point, yesterday I was repeatedly waiting for a command to complete that had an xargs in it, and I went “wait a minute”, added -P0 and it completed much quicker :)

                    I’d be interested in more about your thoughts about the right way to combine shell and “real programming languages” in a way that makes best use of both. In general I’m very sympathetic to that point of view because that’s always how I design systems, expose a bunch of command line programs and tie them together. That’s similar to how git is designed as well.

                    Ultimately though, shell is programming too, so it seems like it’s just an API/ergonomics issue in programming languages that needs to be improved if shell is significantly better for certain tasks.

                    Edit: here’s an example of how I always combine command line programs together: I have a little cb (“clipboard”) program and I’ll often do things like cb | sort -u | cb, or cb | xargs ... | cb to go back and forth from data in my editor.

                    1. 2

                      Yup xargs is one of my favorite commands! In fact I once made a presentation about it which I never turned into a blog post :) http://www.oilshell.org/share/05-24-pres.html

                      Yes if you know how to write and design a CLI in Python, then you’re already mostly there! To me the difference between a CLI tool and a Python function is that the CLI tool is mostly stable. That is, you add things and never take them away, because that would break callers.

                      And this discipline makes you more careful about your code and how it interacts with the world.

                      There are a few books that cover it (and unfortuantely I think it does take a book + a bunch of experience, I’m still learning):

                      Roughly speaking, I’d say my Python programs use stdin, stdout, and stderr better than they used to, and they have better flags, better errors, and better logging/instrumentation. I find it a pretty useful style for structuring code and especially testing it.

                      1. 2

                        Btw I released that library I said I’d do above: https://github.com/kbd/aush

                        I’ve been using Python for so many years but never put up a library on PyPI before. Poetry made it easy.

                        It’s not done yet but it “works” enough to share. Currently learning asyncio things so I can implement streaming output.

                        1. 1

                          Nice README, it’s very clear. I can see it being useful for some tasks but I still like shell for pipelines, redirects, and a few other things :)

                        2. 1

                          Yes if you know how to write and design a CLI in Python, then you’re already mostly there!

                          Yeah I have argparse basically memorized :) The way I look at it, command line programs are basically “functions” available from any other language, with the caveat that they can only take strings as arguments and return a return code and a string.

              2. 1

                For Python, it feels more like a curious lack of batteries rather than abundance of wheels. As much as people like to promote Python as bash replacement, spawning a subprocess in vanilla Python is much less ergonomic than in bash. And, to make this ergonomic, you need some way to make ls $dir syntax work without injections, and Python doesn’t have nice facilities to do that. This is not specific to Python even, spawning a process in most languages is either a chore (looking at you, Rust), or depends on shell (Ruby, Perl). That’s why I kinda gave up on “normal” languages, and just write my scripts in Julia: they get this detail right, despite this being not really their domain. It was also pleasant to re-learn that JavaScript’s string interpolation works the right way, allowing for library-defined interpolation semantics.

                1. 1

                  I always attribute the weird/limited APIs for spawning processes in most languages to a (perhaps misguided) attempt at portability. (And I was using Python before the subprocess module existed; it was REALLY impoverished back then.)

                  I think you can get close in Python with f strings now – how about something like:

                  os.system(f'mplayer {filename:shell_escape}')
                  

                  I think you just have to write/register the shell_escape “formatter” (but I haven’t tested this).


                  Back to the higher level point: The way processes work between Unix and Windows is completely different (compare with the file system which is more similar). To make a portable interface on top of them limits the functionality greatly. The errors you can get back are different, and pipelines are a whole other can of worms.

                  Windows is really about thread-based concurrency (and async); process-based concurrency is an afterthought. Processes are slow and heavy on Windows. The original concurrency model of Unix was processes; there were no threads.


                  To be fair, it’s also painful to spawn processes in C on Unix – fork, exec, and making sure you don’t have descriptor leaks! (CLOEXEC and all that). And shell gets it wrong because it doesn’t have first class support for arrays, and arrays are literally in the C interface (char **argv).

                  And pipelines: pipe(), dup(), close(), and fcntl(). Unix is very flexible but also makes you do a lot of work yourself. The C standard library barely helps.

                  Also to me it is funny that bash made the env var solution arguably unsafe (from the previous thread about zx). The example I listed happens to be safe though.

                  https://lobste.rs/s/9yu5sl/after_discussion_here_i_created_lib_for#c_paq9ch

                  You have to avoid using the environment variable in array subscripts:

                  a[$DIR]=1  # unsafe, hidden EVAL in ksh/bash
                  echo ${a[$DIR]}  # unsafe, hidden EVAL in ksh/bash
                  

                  which is probably too subtle a rule to recommend to people. So quoting is the more standard and more explainable solution.

                  Although I think this would be a lot cleaner and require less mechanisms from each calling language.

                  The fault is really with ksh and not bash, since bash copied the double expansion / hidden “eval” from ksh. If there were no hidden EVALs, like Python/JavaScript/every other language, this would be by far the better solution.

                  Although I think there is a subtlety – I wonder if typical shell quoting actually prevents expansion in the a[$DIR] case. It might not. I will think about that…

                  1. 1

                    Yeah actually quoting does NOT protect you from the hidden eval problem [1], so the environment var solution I gave is a good as quoting.

                    An equivalent solution without env vars (and leakage) is simply to invoke sh -c and pass an argument:

                    >>> untrusted='/bin'
                    >>> 
                    >>> subprocess.call(['sh', '-c', 'find "$1" | wc -l', 'dummy0', untrusted])
                    170
                    

                    Why this is useful:

                    • It does NOT use Python string interpolation.
                    • It doesn’t require shell quoting. Automatic (Julia-like) or otherwise.
                    • It’s as safe as quoting. Quoting is still subject to the hidden eval caveat, so the 2 solutions are on equal footing.

                    Downsides:

                    • Your language has to let you spawn an argv array without the shell. Most languages let you do that now, but maybe languages like awk don’t.
                    • The dummy0 thing is probably confusing to some people.
                    • You might forget the quoting around $1 like I initially did :)

                    I should write a blog post about this, but it’s probably at the back of the queue


                    [1] For some reason this is hard for people to understand, but here are the refs:

                    https://github.com/oilshell/blog-code/tree/master/crazy-old-bug

                    http://www.oilshell.org/blog/2019/01/18.html#a-story-about-a-30-year-old-security-problem

                    1. 1

                      Automatic (Julia-like) or otherwise.

                      Not sure if this is just a choice of word, but, just in case, Julia doesn’t do any quoting. It doesn’t need to, because it never concatenates arguments into a single string. run(`ls $dir`) does subprocess.call([“ls”, dir]).

                      Your language has to let you spawn an argv array without the shell

                      I’d go as far as saying that new languages & runtimes shouldn’t have ability to spawn with the shell at all, even via opt-in mechanism like shell=True in Python. If someone knows what they are doing, they can [“sh”, “-c”].

                      1. 1

                        Ah OK I thought it was auto-quoting. So then does Julia reimplement pipelines with find $dir | wc -l ? I think I might have seen that with their use of libuv. My first thought is that is a bad idea – there are other constructs in shell you might want besides pipelines, and you don’t want to implement an entire shell inside the language runtime.

                        I wouldn’t object to an explicit sh -c everywhere – it might clear up a lot of confusion.

                        1. 1

                          Yeah, Julia does implement pipelines itself.

                    2. 1

                      I think you can get close in Python with f strings now

                      I rather strongly feel that this is nowhere close. The right solution should be the default and more ergonomic (and preferably the only) option. Otherwise folks will do the wrong thing without knowing it.

                      Like, zx demonstrates how hard it is to make people write safe code. JavaScript backtick syntax is specifically designed to make it possible to not do injection vulnerabilities. tag`ls $d` will call tag([“ls”], d). And yet the library happily concatenates that back into a single sitting, because node has unsafe (but ergonomic) child_process.exec API.

                2. 1

                  ‘sh’ is nice, if you require it only on linux. More recent python versions have subprocess.run, which makes it really easy to execute commands, catch stdout/stderr, have timeouts defined, send into background etc. I replaced all my ‘sh’ occurrences with subprocess.run.

                3. 3

                  The main motivation for me was to make it dead simple to jump between shell scripts and Python, and have 0 learning curve. And I think no other library achieves that as well.

                  1. 2

                    I’d honestly like more clarity on this. What is it specifically about writing a script in this or zx for example, that is more harmful than writing the same thing in, say, bash?

                    Anything that can go wrong with you writing a script in this, can go wrong with you doing the same in bash, because in the end it’s supposed to be a shell script that you wrote, not much else. To me it just sounds like saying bash has shell injection vulnerabilities.

                    If I’m getting this wrong let me know.

                    1. 4

                      Here’s a minimal example: https://gist.github.com/matklad/b971c2502d99b38fc3b54345902b9e9b

                      So, the main issue is that, while shell substitution for most shells is just unobvious, everything which calls a shell with a string argument is remote code execution vector.

                      The second issue is not security critical, but is pretty important from “less mess” point of view. Shells differ. There’s no bash on windows, for example. Shelling out to system’s shell introduces an extra dependency on this system’s state. But this dependency is not needed: if you want to spawn a pipe of commands, you can just do that! Shell is an extra middleman here.

                      Those two Julia posts are good stuff, highly recommended!

                      1. 3

                        We need a “Bobby Tables” meme for shell injection :) [1]

                        https://xkcd.com/327/

                        Some Googling shows that it caught on to explain SQL injection in many places, e.g. https://bobby-tables.com/

                        [1] And HTML injection – a few years ago someone was posting “minimal” CGI code in C complete, with obvious HTML injection aka XSS

                        1. 1

                          Thanks for the reply, it is much appreciated :)

                          Although I’ll say that this project isn’t intended for anything more than making local shell scripts easier to write, preferably with no user input at all, let alone unsanitized stuff. I should clarify this better in the readme.

                          1. 2

                            Yeah, to clarify, just spawning a shell is totally OK, if the docs have Security section, which explains that the API is susceptible to shell injection. Basically, having this: https://docs.python.org/3/library/subprocess.html#security-considerations.

                        2. 4

                          Quoting is one of the nastiest things about shell scripting. It’s pretty easy to get wrong, and the effects of getting it wrong can be dire. (Apple once shipped an iTunes update with an installer script that erased user files if the path to the iTunes app contained spaces, IIRC.)

                          This gets even worse when you have another layer (Python) doing string assembly and quoting, with different syntax and different rules.

                          If I’m going to use something other than a shell to write a script, one absolute requirement for me is that it make quoting better, not worse. If I’m using Python to call an external program, I want one Python argument (or array item) per argument.

                          1. 1

                            Quoting can actually be added very nicely into my project, simply by patching any variables you pass into the shell commands. I’ll read up more about it.

                          2. 2

                            Ouch, sorry for a too grumpy/curt reply — seeing this on lobsters, I assumed that the author saw the previous thread, which isn’t really a right deduction :)

                            In my defense, it is the third time this year I point out this specific security issue (:

                            1. 1

                              honestly kinda wish I had never written this very cursed comment

                              1. 2

                                Yeah, that was a bit uncalled for. I guess, karma works, as now we know that JavaScript’s $` can actually have the right semantics, unlike Ruby’s backticks :D

                        3. 1

                          These libraries feel a little bit like the best possible effort on something that is essentially not the right solution.

                          The languages included in shells have command invocation as a first class citizen and offer second class support for things that are the bread and butter of general purposes programming languages. This is by design. The goal is to be able just to script your commands the way you enter them in the interactive prompt. Hence the name, a script, rather than a program.

                          What these libraries do is essentially something that PERL included. You had special simplified syntax to shell out commands and deal with its input and output. PHP inherited some of these too. Running commands with backticks comes to mind. It didn’t take long before most people realize that you rarely need to shell out from perl.

                          All programming languages these days offer something in the lines of os.system(), which does the job fine. If the goal is saving a couple keypresses, nothing stops from reassigning that function to a one letter symbol. In languages such as ruby, scala, coffeescript, crystal, etc. you don’t even need parenthesis for function calls. So perhaps that would be what many people are after?

                          The author of ZX wrote in the readme “for better scripting”. I don’t quite agree with the usage of the word ‘better’ in here. I think they meant ‘easier’.

                          Perhaps what people most unknowingly want is a shell with a less hairy quoting and expansion syntax. Or maybe they have never learned how to use their shell to the intended fluency of the average user.

                          1. 1

                            Afaics this is not async, which is half the fun of JS zx.

                            1. 1

                              That’s cool but, if you’re gonna use AST transformation anyways, why not using something better than ~ for executing commands? It just looks … wrong

                              1. 1

                                Hi - I made this.

                                What other syntax would you suggest? $’…’, backticks, !, etc are all non valid Python syntax. This was only only unused valid Python syntax that I found that looked sane.

                                If you have any suggestions they’re totally welcome here and on Github.

                              2. 1

                                I’m so glad I can just use fish.