1. 17
  1.  

  2. 34

    In my opinion, the advice in this blog is terrible.

    The author is correct in that calling through a shell is a bad idea, especially if you are passing user input along. But even without that it’s a bad idea due to shell expansion and quoting and all that.

    The author then introduces exec, which doesn’t have this problem, but dismisses it because you have to figure out all the stdin/stdout business and forking. Except, you don’t, because we use computers and the thing that computers let us do is automate things so we can just find a library that takes care of that for us. A library like the one I assume the author is referencing when talking about Python. So the problem with exec really isn’t a problem.

    There are many good reasons to use subprocesses:

    1. Processes provide memory isolation which means that it cannot mess with your program, which makes understanding a system generally easier, and the system generally more robust.
    2. One gets automatic parallelization thanks to the operating system. Combined with memory isolation, this is generally a lot safer than using threads.
    3. It can be used to make your system more secure.
    4. The failure modes of processes are much easier to deal with. The process taking too long? Kill it. Its resources will be cleaned up. Maybe it leaves some files around but it’s generally easier to write a program that is designed to be killed because that’s part of the expectations than it is a library.
    5. Unless a subprocess exposes itself as a library so that you’re calling the same exact code, reimplementing functionality is a great way to add bugs to a system. Sometimes it is the right decision, though.

    In general, I think people should be utilizing subprocesses more.

    1. 2

      Nice rebuttal. More interesting is that points 1-4 can almost be used as-is in justifying using security-focused microkernels. The main difference is the base TCB drops to around 4-12kloc. Always thought porting Qmail to a separation kernel would be an interesting project given its architecture is similar at a high-level. Might be UNIX-specific details that could cause difficulty, though.

    2. 9

      Relatively misleading title - I thought we were getting an interesting new take on or analysis of context switching costs, process duplication overhead, etc. but the article was mostly “don’t call system or similar with an untrusted argument” which seems obvious.

      “Subprocesses are a code smell” seems to me to be a wholly unsubstantiated claim in the article. Subprocesses which kick off any command/program an attacker wishes? Definitely more than a code smell. Use of subprocesses at all though?

      1. 2

        It boils down to “don’t blacklist, whitelist“. The example git commit -m “<userdata >” is super safe if userdata matches [A-z0-9 ]+

        1. 2

          No, just use the correct API to run a program.

          1. 1

            What would ‘the correct API’ be for the case @ec mentions?

            1. 6

              libgit2. Maybe.

              I mean, it depends on why am I letting someone make commit messages?

              Sure, there’s lots of situations where libgit2 might be appropriate, but it’s a big dep. What am I doing, really?

              Maybe I just write the git objects directly. It’s not hard.

              But there’s also lots of cases where I would probably just use system. I don’t see what’s so hard about quoting/escaping, since it’s easy to make the shell will do it for you:

              if(0==setenv("message",text))system("git commit -m \"$message\" -a");
              

              @ec is right about input sanitising (though): Do I want someone to make a 64kb git commit message? Do I want a commit message that contains evil strings? If I try to build a blacklist, at which point is it good enough? This is an important point, it just has nothing to do with subprocesses.

              1. 4

                It depends on the language, but use the exec* functions or something that wraps them. In Python subprocess lets you pass in a list of arguments which is safe to escaping.

        2. 4

          @StefanKarpinski had a nice comment on HN about this:

          I had similar thoughts some time ago but came to a very different conclusion: anything the shell can do a programming language could do just as well, given the right interface. The fact that languages don’t make it easy to spawn subprocesses safely doesn’t mean that they couldn’t do so. The right solution is making it both convenient and safe to do shell-like things from within a real programming language. So I made sure that Julia does this right:

          https://julialang.org/blog/2012/03/shelling-out-sucks

          https://julialang.org/blog/2013/04/put-this-in-your-pipe

          1. 2

            I disagree with the author. And very much agree with @apy’s comment.

            I wrote (3 years ago) what I considered at the time a nice API to fork/exec/pipe out in python. It is old now, and might not be the best Python code according to my current clean code standards. (Also it doesn’t support Python 3’s asyncio) I would need to get back to it and fix it, but I never gain a user base.

            Sorry for the self-promotion plug :) .

            My point is: shelling out is awful, I agree. fork()-ing, exec()-ing and pipe2()-ing is totally fine. And sometimes, it can lead to better separation of concerns and security. (Actually that’s what postfix does, and it’s consider as one of the most secure MTA out there)

            1. 2

              I think the claim that escaping is unlikely to get done correctly is too strong — getting shell escaping right once is not harder than getting HTML generation with escaped user content done right, and single-quoting requires to escape only the single quote itself (although in an annoying way).

              And many useful system APIs have process granularity.

              1. 3

                Just use the correct API and you do not need to escape.