These are useful examples, i.e. 2 failed attempts and the right one.
I have found that quoting and evaluation is sort of a “missing topic” in programming education. I think I got exposed to it through Lisp but then it took awhile for my brain to transfer that knowledge to strings, Python, shell and C. It’s very important for security, i.e. understanding SQL injection, HTML injection (XSS), and shell injection.
For example this post has all sorts of quoting/evaluation errors, like manipulating shell code with sed and then piping directly to sh:
But it doesn’t work if you have a newline. It sometimes emits the $'\n' strings, but doesn’t understand them. These are the most general type of string (e.g. POSIX single quoted strings can’t contain single quotes).
So the fact that bash doesn’t do this correctly is more evidence that even the authors of languages are confused about quoting and evaluation.
I think that some people wondered why Oil even has QSN at all! It is so you can quote and unquote correctly 100% of the time. You don’t have to worry about data-dependent bugs, like when you strings contain spaces, newlines, single quotes, double quotes, or backslashes.
It’s just Rust string literals, which are a cleaned up version of C string literals. Most people understand 'foo\n' but not necessarily
x='foo
'
(the way to write a newline in POSIX shell)
That is, there is a trick to concatenate \' in POSIX shell, but it doesn’t have the property of fitting on a single line.
I think what would be useful is to have a post on the relationship between “quoting and evaluation” and say JSON serialization and deserialization. They are kind of the same thing, except the former is for code, and the latter is for data. It’s not an accident that JSON was derived from the syntax of JavaScript, etc.
So the problem here is to serialize an argv array as a string. And SSH does it naively by concatenating and separating with a space! This leads to the problem where you arguments with special characters are mangled!
Indeed, I’ve also noted this design flaw of Ssh (and that it’s concatenating the wrong way) in my how to do things safely in Bash guide. So not entirely unrecognised, for what it’s worth.
After reading your xargs post, I think I’m of the exact opposite opinion: I only use find -exec and haven’t touched xargs in years. I understand xargs enough to know that it needs to be treated carefully depending on what data is being passed around, and it doesn’t work well with my idea of iteratively building up commands. Consider going from find . -type f where it simply prints out the names, then moving on to using xargs requires changes to both the find command and to xargs: find . -type f -0 | xargs -0 rm. Of course, in this trivial example, it would just be better as find . -type f -exec rm {} + (for symmetry with \; I usually write it as \+).
Instead, I’ve taken to using a strategy where I go straight from find back into the shell. The pattern is kind of obtuse, admittedly, but there’s never a case where the filenames get passed around through a pipe and where delimiters have to be considered. The simple example above would be:
It’s a bit of a mouthful, but it then lets you use any shell features inside of the bash -c command, which I prefer because I already think in terms of shell expansions and commands. I use this a lot when I need to rename files with a weird convention. For example, I’ve used this before to convert a folder full of world.2017-01-01.converted.bin files that should be converted to world/converted/2017-01-01.bin could be written as:
The '<bash -c>' or '<bash -c rename>' argument is needed because it sets Bash’s argv[0] which is shown in process listings. Without it, the first argument gets lost.
At this level of effort, I think it could be better to just use shell completely and make use of shopt -s globstar. I think it would look like this:
$ shopt -s globstar
$ for src in **/world.*.converted.bin; do dst=$src; dst=${dst%.converted.bin}.bin; dst=${dst#world.}; dst=world/converted/$dst; mv "$src" "$dst"; done
But then you lose both: all of the extra features within find and being able to more easily build up the command iteratively. Plus, my brain prefers to go straight to find when I need to recursively go through directories, and globstar is more of an afterthought.
Side note: I realized that it could be somewhat straightforward to write a “find to bash” (or “find to posix sh”) converter to remove any find dependency altogether, something like the following:
$ find2bash . -type f -name 'world.*.converted.bin' -mmin 10 -exec echo Removing {} now... \;
#!/usr/bin/env bash
tempdir=$(mktemp -d)
printf -v escaped 'rm %q' "${tempdir:?}"
trap "${escaped:?}" EXIT
# find -mmin 10
# I think this has to be GNU's touch
touch --date="-10 minutes" "${tempdir:?}/-mmin 10"
shopt -s globstar
for arg in ./**; do # find .
# find -type f
[ -f "${arg:?}" ] || continue
# find -name world.*.converted.bin
case "${arg:?}" in
(world.*.converted.bin);;
(*) continue;;
esac
# find -mmin 10
[ "${arg:?}" -nt "${tempdir:?}/-mmin 10" ] || continue
# find -exec echo Removing {} now... ;
echo Removing "$arg" now...
done
That’s definitely a valid way of doing it and I will concede that find -exec \+ doesn’t have the gotcha of newlines in filenames, which xargs -d $'\n' does.
It’s nice to preview and separate the two issues: what to iterate on, and what to do. It’s basically like Ruby / Rust iteration vs. Python.
xargs -P is huge; can’t do this with find
find is its own language which I think is annoying. It has globs, regexes, and printf. I’d rather just use shell globs, regex, and printf.
Although your -exec bash idiom is very similar to the $0 dispatch pattern I mention. I use xargs to “shell back in”, and you are using find to “shell back in”.
Mixing command and data in the same string gives me shivers. The article doesn’t say, but it needs to be said that this would be cleaner with an array:
cmd=(cat "foo bar")
"${cat[@]}"
This is unfortunately a bashism. But so be it, I say: This is reason enough, and the only reason, not to use POSIX shell as far as I’m concerned.
These are useful examples, i.e. 2 failed attempts and the right one.
I have found that quoting and evaluation is sort of a “missing topic” in programming education. I think I got exposed to it through Lisp but then it took awhile for my brain to transfer that knowledge to strings, Python, shell and C. It’s very important for security, i.e. understanding SQL injection, HTML injection (XSS), and shell injection.
For example this post has all sorts of quoting/evaluation errors, like manipulating shell code with
sed
and then piping directly tosh
:https://codefaster.substack.com/p/xargs-considered-harmful
I have used that pattern in the past, but I’ve moved away from it in favor of xargs, and I never put it in a shell script.
I responded to it here: http://www.oilshell.org/blog/2021/08/xargs.html
Fun fact: bash ALMOST does its quoting correctly with
printf %q
or${x@Q}
and the “not quite inverse”printf %b
to unquote.https://github.com/oilshell/oil/wiki/Shell-Almost-Has-a-JSON-Analogue
But it doesn’t work if you have a newline. It sometimes emits the
$'\n'
strings, but doesn’t understand them. These are the most general type of string (e.g. POSIX single quoted strings can’t contain single quotes).So the fact that bash doesn’t do this correctly is more evidence that even the authors of languages are confused about quoting and evaluation.
Oil has QSN instead: https://www.oilshell.org/release/latest/doc/qsn.html
I think that some people wondered why Oil even has QSN at all! It is so you can quote and unquote correctly 100% of the time. You don’t have to worry about data-dependent bugs, like when you strings contain spaces, newlines, single quotes, double quotes, or backslashes.
It’s just Rust string literals, which are a cleaned up version of C string literals. Most people understand
'foo\n'
but not necessarily(the way to write a newline in POSIX shell)
That is, there is a trick to concatenate
\'
in POSIX shell, but it doesn’t have the property of fitting on a single line.I think what would be useful is to have a post on the relationship between “quoting and evaluation” and say JSON serialization and deserialization. They are kind of the same thing, except the former is for code, and the latter is for data. It’s not an accident that JSON was derived from the syntax of JavaScript, etc.
Actually a really practical example of where this comes up is SSH quoting:
https://lobste.rs/s/8tki7j/ssh_quoting
https://www.chiark.greenend.org.uk/~cjwatson/blog/ssh-quoting.html
So the problem here is to serialize an argv array as a string. And SSH does it naively by concatenating and separating with a space! This leads to the problem where you arguments with special characters are mangled!
https://news.ycombinator.com/item?id=27483077
Top comment:
Indeed, I’ve also noted this design flaw of Ssh (and that it’s concatenating the wrong way) in my how to do things safely in Bash guide. So not entirely unrecognised, for what it’s worth.
After reading your xargs post, I think I’m of the exact opposite opinion: I only use
find -exec
and haven’t touchedxargs
in years. I understand xargs enough to know that it needs to be treated carefully depending on what data is being passed around, and it doesn’t work well with my idea of iteratively building up commands. Consider going fromfind . -type f
where it simply prints out the names, then moving on to using xargs requires changes to both the find command and to xargs:find . -type f -0 | xargs -0 rm
. Of course, in this trivial example, it would just be better asfind . -type f -exec rm {} +
(for symmetry with\;
I usually write it as\+
).Instead, I’ve taken to using a strategy where I go straight from
find
back into the shell. The pattern is kind of obtuse, admittedly, but there’s never a case where the filenames get passed around through a pipe and where delimiters have to be considered. The simple example above would be:It’s a bit of a mouthful, but it then lets you use any shell features inside of the
bash -c
command, which I prefer because I already think in terms of shell expansions and commands. I use this a lot when I need to rename files with a weird convention. For example, I’ve used this before to convert a folder full ofworld.2017-01-01.converted.bin
files that should be converted toworld/converted/2017-01-01.bin
could be written as:The
'<bash -c>'
or'<bash -c rename>'
argument is needed because it sets Bash’sargv[0]
which is shown in process listings. Without it, the first argument gets lost.At this level of effort, I think it could be better to just use shell completely and make use of
shopt -s globstar
. I think it would look like this:But then you lose both: all of the extra features within
find
and being able to more easily build up the command iteratively. Plus, my brain prefers to go straight tofind
when I need to recursively go through directories, andglobstar
is more of an afterthought.Side note: I realized that it could be somewhat straightforward to write a “find to bash” (or “find to posix sh”) converter to remove any
find
dependency altogether, something like the following:</ramble></ramble></ramble>
That’s definitely a valid way of doing it and I will concede that
find -exec \+
doesn’t have the gotcha of newlines in filenames, whichxargs -d $'\n'
does.However my responses:
xargs -d $'\n'
composes with other tools like grep and shuf: https://www.oilshell.org/blog/2021/08/xargs.html#xargs-composes-with-other-toolsxargs -P
is huge; can’t do this withfind
Although your
-exec bash
idiom is very similar to the $0 dispatch pattern I mention. I use xargs to “shell back in”, and you are using find to “shell back in”.Discussed here btw: https://lobste.rs/s/xestey/opinionated_guide_xargs
https://news.ycombinator.com/item?id=28258189
Mixing command and data in the same string gives me shivers. The article doesn’t say, but it needs to be said that this would be cleaner with an array:
This is unfortunately a bashism. But so be it, I say: This is reason enough, and the only reason, not to use POSIX shell as far as I’m concerned.
zsh:
${(Q)${(z)cmd}}
Whether this is good or not is left to the reader’s discretion.