1. 10
  1. 4

    These are useful examples, i.e. 2 failed attempts and the right one.

    I have found that quoting and evaluation is sort of a “missing topic” in programming education. I think I got exposed to it through Lisp but then it took awhile for my brain to transfer that knowledge to strings, Python, shell and C. It’s very important for security, i.e. understanding SQL injection, HTML injection (XSS), and shell injection.

    For example this post has all sorts of quoting/evaluation errors, like manipulating shell code with sed and then piping directly to sh:

    https://codefaster.substack.com/p/xargs-considered-harmful

    I have used that pattern in the past, but I’ve moved away from it in favor of xargs, and I never put it in a shell script.

    I responded to it here: http://www.oilshell.org/blog/2021/08/xargs.html


    Fun fact: bash ALMOST does its quoting correctly with printf %q or ${x@Q} and the “not quite inverse” printf %b to unquote.

    https://github.com/oilshell/oil/wiki/Shell-Almost-Has-a-JSON-Analogue

    But it doesn’t work if you have a newline. It sometimes emits the $'\n' strings, but doesn’t understand them. These are the most general type of string (e.g. POSIX single quoted strings can’t contain single quotes).

    So the fact that bash doesn’t do this correctly is more evidence that even the authors of languages are confused about quoting and evaluation.


    Oil has QSN instead: https://www.oilshell.org/release/latest/doc/qsn.html

    I think that some people wondered why Oil even has QSN at all! It is so you can quote and unquote correctly 100% of the time. You don’t have to worry about data-dependent bugs, like when you strings contain spaces, newlines, single quotes, double quotes, or backslashes.

    It’s just Rust string literals, which are a cleaned up version of C string literals. Most people understand 'foo\n' but not necessarily

    x='foo
    '
    

    (the way to write a newline in POSIX shell)

    That is, there is a trick to concatenate \' in POSIX shell, but it doesn’t have the property of fitting on a single line.


    I think what would be useful is to have a post on the relationship between “quoting and evaluation” and say JSON serialization and deserialization. They are kind of the same thing, except the former is for code, and the latter is for data. It’s not an accident that JSON was derived from the syntax of JavaScript, etc.

    1. 5

      Actually a really practical example of where this comes up is SSH quoting:

      https://lobste.rs/s/8tki7j/ssh_quoting

      https://www.chiark.greenend.org.uk/~cjwatson/blog/ssh-quoting.html

      So the problem here is to serialize an argv array as a string. And SSH does it naively by concatenating and separating with a space! This leads to the problem where you arguments with special characters are mangled!

      https://news.ycombinator.com/item?id=27483077

      Top comment:

      I think the fact that SSH’s wire protocol can only take single string as a command is a huge, unrecognized design flaw.

      1. 2

        Indeed, I’ve also noted this design flaw of Ssh (and that it’s concatenating the wrong way) in my how to do things safely in Bash guide. So not entirely unrecognised, for what it’s worth.

      2. 2

        After reading your xargs post, I think I’m of the exact opposite opinion: I only use find -exec and haven’t touched xargs in years. I understand xargs enough to know that it needs to be treated carefully depending on what data is being passed around, and it doesn’t work well with my idea of iteratively building up commands. Consider going from find . -type f where it simply prints out the names, then moving on to using xargs requires changes to both the find command and to xargs: find . -type f -0 | xargs -0 rm. Of course, in this trivial example, it would just be better as find . -type f -exec rm {} + (for symmetry with \; I usually write it as \+).

        Instead, I’ve taken to using a strategy where I go straight from find back into the shell. The pattern is kind of obtuse, admittedly, but there’s never a case where the filenames get passed around through a pipe and where delimiters have to be considered. The simple example above would be:

        $ find . -type f -exec bash -c 'rm "$@"' '<bash -c>' {} \+
        

        It’s a bit of a mouthful, but it then lets you use any shell features inside of the bash -c command, which I prefer because I already think in terms of shell expansions and commands. I use this a lot when I need to rename files with a weird convention. For example, I’ve used this before to convert a folder full of world.2017-01-01.converted.bin files that should be converted to world/converted/2017-01-01.bin could be written as:

        $ mkdir -p world/converted
        $ find . -type f -name 'world.*.converted.bin' \
        >     -exec bash -c 'for src; do dst=$src; dst=${dst%.converted.bin}.bin; dst=${dst#world.}; dst=world/converted/$dst; mv "$src" "$dst"; done' '<bash -c rename>' {} \+
        

        The '<bash -c>' or '<bash -c rename>' argument is needed because it sets Bash’s argv[0] which is shown in process listings. Without it, the first argument gets lost.


        At this level of effort, I think it could be better to just use shell completely and make use of shopt -s globstar. I think it would look like this:

        $ shopt -s globstar
        $ for src in **/world.*.converted.bin; do dst=$src; dst=${dst%.converted.bin}.bin; dst=${dst#world.}; dst=world/converted/$dst; mv "$src" "$dst"; done
        

        But then you lose both: all of the extra features within find and being able to more easily build up the command iteratively. Plus, my brain prefers to go straight to find when I need to recursively go through directories, and globstar is more of an afterthought.


        Side note: I realized that it could be somewhat straightforward to write a “find to bash” (or “find to posix sh”) converter to remove any find dependency altogether, something like the following:

        $ find2bash . -type f -name 'world.*.converted.bin' -mmin 10 -exec echo Removing {} now... \;
        #!/usr/bin/env bash
        
        tempdir=$(mktemp -d)
        printf -v escaped 'rm %q' "${tempdir:?}"
        trap "${escaped:?}" EXIT
        
        # find -mmin 10
        # I think this has to be GNU's touch
        touch --date="-10 minutes" "${tempdir:?}/-mmin 10"
        
        shopt -s globstar
        for arg in ./**; do  # find .
            # find -type f
            [ -f "${arg:?}" ] || continue
        
            # find -name world.*.converted.bin
            case "${arg:?}" in
            (world.*.converted.bin);;
            (*) continue;;
            esac
        
            # find -mmin 10
            [ "${arg:?}" -nt "${tempdir:?}/-mmin 10" ] || continue
        
            # find -exec echo Removing {} now... ;
            echo Removing "$arg" now...
        done
        

        </ramble></ramble></ramble>

        1. 1

          That’s definitely a valid way of doing it and I will concede that find -exec \+ doesn’t have the gotcha of newlines in filenames, which xargs -d $'\n' does.

          However my responses:

          1. xargs -d $'\n' composes with other tools like grep and shuf: https://www.oilshell.org/blog/2021/08/xargs.html#xargs-composes-with-other-tools
          2. It’s nice to preview and separate the two issues: what to iterate on, and what to do. It’s basically like Ruby / Rust iteration vs. Python.
          3. xargs -P is huge; can’t do this with find
          4. find is its own language which I think is annoying. It has globs, regexes, and printf. I’d rather just use shell globs, regex, and printf.

          Although your -exec bash idiom is very similar to the $0 dispatch pattern I mention. I use xargs to “shell back in”, and you are using find to “shell back in”.

          Discussed here btw: https://lobste.rs/s/xestey/opinionated_guide_xargs

          https://news.ycombinator.com/item?id=28258189

      3. 3

        Mixing command and data in the same string gives me shivers. The article doesn’t say, but it needs to be said that this would be cleaner with an array:

        cmd=(cat "foo bar")
        "${cat[@]}"
        

        This is unfortunately a bashism. But so be it, I say: This is reason enough, and the only reason, not to use POSIX shell as far as I’m concerned.

        1. 1

          zsh: ${(Q)${(z)cmd}}

          Whether this is good or not is left to the reader’s discretion.