1. 18
  1.  

  2. 4

    Nice article, I definitely think this is needed and timely. I have seen “casual” shell injections popping often. For example, on the xargs thread from 2 weeks ago, where the post specifically advocated practices that were vulnerable to shell injection (from untrusted filenames).

    In contrast, xargs can eliminate this risk if used to correctly (although as always it’s not that easy to figure out from the man page. The key is that it mostly deals with “args”, not flat strings!)

    https://lobste.rs/s/wlqveb/xargs_considered_harmful

    https://lobste.rs/s/wlqveb/xargs_considered_harmful#c_kwsxtc (comment pointing out the shell injection)

    I also remember some “minimal” CGI code in C that someone posted that has extremely obvious HTML injection. (One of my pet peeves is that XSS has an unintuitive name; HTML+JavaScript injection is a better name for it.)


    I’m beginning to think the Shell/SQL/HTML+JS injection issue injection is a hole in education. I would approach it from 2 angles: “memes” and language theory.

    For memes, xkcd made a good “Bobby Tables” one in 2007, which I reposted here:

    https://old.reddit.com/r/oilshell/comments/n9qcrp/exploits_of_a_mom_sql_injection/

    Basically every working programmer should understand why the payload is:

    • Robert’); DROP TABLE Students; --

    For shell injection, I suggested “The Ballad of Rimraf”:

    • The Ballad of ; rm -rf / (with the common misusage being os.system('mplayer %s' % filename)

    For HTML+JS injection, it could be the alert("pwned") restaurant or something. I didn’t think that one through.

    If someone can actually make comics, they should run with this :)


    From the language theory side, I also have a hard time explaining it, but I think there is an idea of creating an injection attack as sort of a proof by counterexample. You want to “prove” (or really informally reason) that your code is correct over arbitrary strings in an alphabet (filenames, URL parameters, etc.). The injection attack is the counterexample.

    There are actually 3 different ways to avoid injection attacks, which I outline here: http://www.oilshell.org/blog/2021/06/oil-language.html#educational-posts-to-support-string-safety

    And I think this language-based reasoning gets at why they all work (?) At least in my mind.


    (BTW I think it would be worthwhile to repeat the actionable advice up front, in addition to putting it at the end. It may be a lot for some readers to get through.)

    As another footnote, Joel Spolsky has a 2005 article about using a coding convention (really “Hungarian” notation) to avoid XSS. But I believe this is mostly obsolete if you’re using a template language, which didn’t really exist at the time (other than PHP which was/is unsafe by default!). It’s really for people who are manually creating HTML pages in VB or Java, which almost nobody does anymore.

    https://www.joelonsoftware.com/2005/05/11/making-wrong-code-look-wrong/

    1. 2

      Most ORMs and DB libraries have automatically protected you from SQL injection for years and years now.

      But people keep writing articles insisting that libraries and ORMs are bad and everyone should instead write their own SQL by hand.

      1. 1

        I don’t have any real stance on that – my issue is that IF you do drop down to SQL, then you need to know enough not to write SQL injections :) And also you shouldn’t write blog posts with obvious SQL / shell / HTML+JS injections.

    2. 1

      Better to use zx (https://github.com/google/zx). Zx properly escapes all arguments:

      await $`curl ${line}`;
      

      And also have some other neat things for nodejs scripts.

      Take a look at zx.

      1. 3

        zx is linked from the post (was one of the cases I complained about, thanks for fixing that :-) )

        Important thing to note is that zx still uses system’s shell. While, with proper quoting, it is secure, I don’t think that’s the best design, and that it’s better to spawn the process directly, bypassing the shell. See the Julia post for details.

        But I also see that “learn from zx tasteful API design, and implement your own version without dependency on system shell, preferably for deno” isn’t necessary the most actionable advice, so, yeah, just use zx.

        1. 1

          I see, thanks for the link 😊 I like the simplicity of zx. There is no need for bypassing shell for most of scripts.

          Julia post is awesome. Added nothrow to zx after it.

          1. 3

            I like the simplicity of zx. There is no need for bypassing shell for most of scripts.

            This makes total sense in the context of existing node APsI. If we ignore node, and look at the systems level (and that would be a different context from zx’s), the situation is exactly the opposite:

            • most scripts do not need to use the shell
            • using the shell is more complex, as you need to escape and concat arguments, instead of just passing the array as is to execve

            That’s why I think the only real problem is the platform layer, which adds the shell indirection. If someone were to port zx to deno, they wouldn’t even need to think about it, because there’s no shell-based api.

            1. 1

              That’s true, thanks for explanation. I think zx became so popular due to similarity to usual shell (which is familiar to majority of developers).

              And now zx escapes all arguments by default, it’s good nodejs companion.

      2. 1

        Why provide an exploitable API, while a safe version is possible and is more direct? I don’t know, but my guess is that it’s mostly just history. C has system, Perl’s backticks correspond directly to that, Ruby got backticks from Perl, Python just has system, node was probably influenced by all these scripting languages.

        You might want to run a command that you wrote yourself and has no user input, or that uses pipes or other actual shell features. Though spawn does have a shell parameter to give the exec-style behavior, so that’s what it should’ve just been like by default.

        1. 4

          I don’t think any explanation in the “because it is useful” class works: a bunch of newer languages don’t provide this API, and people don’t complain.

          Specifically:

          • If you wrote the command yourself and it has no user input, just use spawn.
          • If you need pipes, spawn has to have the ability to connect two processes anyway.
          • If you need other shell features, well, then spawn /bin/sh -c yourself.

          I also consider the shell=true argument in node/Python to be a misfeature.

          1. 3

            You never need the unsafe API to use pipelines – it can always be expressed in the safe API. Instead of:

            os.system('ls | head -n %s' % number')
            

            You can do:

            subprocess.call(['sh', '-c', 'ls | head -n "$1"', 'dummy0', str(number)])
            

            So you are basically letting the shell dynamically do the substitution rather than building up shell code yourself. This is better.

            It’s the same reason that this awk invocation is meh:

            seq 3 | awk "{ \$1 > $threshold }"  # note double quotes
            

            And this is better:

            seq 3 | awk -v threshold=$threshold '{ $1 > threshold }'  # note single quotes
            
            1. 3

              thanks for the awk trick.