I’ve never seen the bash pipefail option before, and what I’ve been able to read and try does not line up with what is in the blog post. Can someone clarify this for me?
As I understand it, pipefail is about setting the exit status of the overall pipeline:
That command runs for two seconds. In particular, the sleep does not seem to have been interrupted or have any indication that false failed. So if the problem was the command
dos-make-addr-conf | dosctl set template_vars -
Then yes, pipefail is going to make that shell script exit 1 now instead of exiting 0. But I don’t see what stops dosctl set template_vars - from taking the empty output from dos-make-addr-conf and stuffing it into template_vars. Is the whole shell script running in some kind of transaction, such that the exit value from the shell script prevents the writes from hitting production?
Thanks for any clarifications. (I agree with the general rule here about never using shell to do these things in the first place, pipefail or not!)
You’re absolutely right, pipefail is only about the return value of the entire pipeline and nothing else.
From the article:
Enabling this option [pipefail] changes the shell’s behavior so that, when any command in a pipeline series fails, the entire pipeline stops processing.
If unformatted.json does not exist, then, without -e and -o pipefail, you will clobber formatted.json.
Even with errexit and pipefail, you will still clobber formatted.json
$ bash -c '
> set -o pipefail
> set -o errexit
> printf '{}\n' >formatted.json
> cat unformatted.json | jq . >formatted.json
> '
cat: unformatted.json: No such file or directory
$ cat formatted.json
This is because bash starts each part of a pipeline in a subshell, and then waits for each part to finish.
Each command in a pipeline is executed as a separate process (i.e., in
a subshell).
And before running the commands in the subshells bash handles the redirections, so formatted.json is truncated immediately, before the commands are run, which is why you get behavior like:
This does have the desired effect of not running dosctl if dos-make-addr-conf fails, but it is a bit hard to read. Why are you using tee, do you want the config to go to stdout as well? One way to make the control flow easier to read is to use if/else/fi:
if dos-make-addr-conf >config.toml; then
dosctl set template_vars config.toml
else
printf 'Unable to create config.toml\n' >&2
exit 1
fi
This way your intentions are clearer and you don’t even need to rely on pipefail being set.
Just to get it out of the way first, pipefail in itself won’t stop the script from proceeding, so it only makes sense together with errexit or errtrace or an explicit check (à la if false | true; then …). As you say, it’s about the status of the pipeline.
If pipefail is set, the pipeline’s return status is the value of the rightmost command to exit with a non-zero status
But you seem to be right: Pipefail doesn’t propagate the error across the pipeline. Which isn’t surprising given the description above. Firstly, there is of course no waiting for each other’s exit statuses, because the processes in the pipeline are run concurrently. Secondly, it doesn’t kill the other processes either. Not as much as a sighup – your sleep command would evidently die if it got a sighup.
@rsc, I am pretty sure you are correct that setting pipefail, errexit, or nounset has no effect on whether dosctl set template_vars - is run as part of the pipeline. Bash starts all the parts of the pipeline asynchronously, so whether dos-make-addr-conf produces an error or not, even with pipefail, has no effect on whether dosctl is run. I believe the correct solution is to break the pipeline apart into separate steps and check error codes appropriately.
And it is why I avoid putting any non-trivial shell in YAML or in cron job definitions. I always call out to a real shell script with “unofficial strict mode” at the top (set -o errexit nounset pipefail, which is good but not even enough).
I have been beating this drum for years … looks like I need to write some blog posts on proper shell error handling and how Oil fixes it. I have mentioned all these issues but it feels like it’s not sinking in.
diff <(sort left) <(sort nonexistent)
^~
[ -c flag ]:1: fatal: Exiting with status 2 (command in PID 29359)
In contrast, bash will just keep going and ignore failure, regardless of any shell options you set. There is no way to retrieve the exit code with $? or PIPESTATUS similar, or fail on it.
So that shows where “strict mode” does NOT solve the problem. In contrast, this post is similarly complex, but strict mode WOULD have solved the problem.
There is another issue with SIGPIPE that Oil addresses, which I have mentioned in the blog, but also deserves a clearer explanation. Something like yes | head can cause a “false failure”, which shopt -s sigpipe_status_ok in Oil avoids.
it’s widely recommended as best practice that all scripts should start by enabling this
Sounds like it’s time for another periodic reminder that while pipefail can indeed be useful, blindly slapping it on every shell script in sight without consideration of the case-by-case particulars is not necessarily a good idea.
Very much so, yes. After working with byron, I specifically recommend his version of rc, it handles pretty much every ergonomic issue correctly: https://github.com/rakitzis/rc
I’ve never seen the bash pipefail option before, and what I’ve been able to read and try does not line up with what is in the blog post. Can someone clarify this for me?
As I understand it, pipefail is about setting the exit status of the overall pipeline:
But now if I do
That command runs for two seconds. In particular, the sleep does not seem to have been interrupted or have any indication that false failed. So if the problem was the command
Then yes, pipefail is going to make that shell script exit 1 now instead of exiting 0. But I don’t see what stops
dosctl set template_vars -
from taking the empty output fromdos-make-addr-conf
and stuffing it intotemplate_vars
. Is the whole shell script running in some kind of transaction, such that the exit value from the shell script prevents the writes from hitting production?Thanks for any clarifications. (I agree with the general rule here about never using shell to do these things in the first place, pipefail or not!)
You’re absolutely right, pipefail is only about the return value of the entire pipeline and nothing else.
From the article:
Nope, wrong, nothing stops earlier.
Author here – good catch! I tried to golf the example down to a one-liner for clarity, but it looks like I need to update the blog.
Indeed, as @enpo mentioned in a sibling post,
-e
is also critical, and a more accurate reproduction would be something like…If
unformatted.json
does not exist, then, without-e
and-o pipefail
, you will clobberformatted.json
.Even with errexit and pipefail, you will still clobber formatted.json
This is because bash starts each part of a pipeline in a subshell, and then waits for each part to finish.
And before running the commands in the subshells bash handles the redirections, so formatted.json is truncated immediately, before the commands are run, which is why you get behavior like:
Sigh. I’ve updated the post with a new (hopefully correct) contrived example:
Hm, no lobsters acknowledgments in the post? Kinda shame.
This does have the desired effect of not running dosctl if dos-make-addr-conf fails, but it is a bit hard to read. Why are you using
tee
, do you want the config to go to stdout as well? One way to make the control flow easier to read is to useif/else/fi
:This way your intentions are clearer and you don’t even need to rely on pipefail being set.
The article focused on
set -o pipefail
, but the fix presented also hadset -e
. According to the documentation, this makes all the difference.The article should probably have been more clear in that regard.
I took the theory for a test drive, and @lollipopman is entirely correct. Today I learned something new about shell scripting :)
Just to get it out of the way first, pipefail in itself won’t stop the script from proceeding, so it only makes sense together with errexit or errtrace or an explicit check (à la
if false | true; then …
). As you say, it’s about the status of the pipeline.man 1 bash:
But you seem to be right: Pipefail doesn’t propagate the error across the pipeline. Which isn’t surprising given the description above. Firstly, there is of course no waiting for each other’s exit statuses, because the processes in the pipeline are run concurrently. Secondly, it doesn’t kill the other processes either. Not as much as a sighup – your sleep command would evidently die if it got a sighup.
@rsc, I am pretty sure you are correct that setting pipefail, errexit, or nounset has no effect on whether
dosctl set template_vars -
is run as part of the pipeline. Bash starts all the parts of the pipeline asynchronously, so whetherdos-make-addr-conf
produces an error or not, even with pipefail, has no effect on whetherdosctl
is run. I believe the correct solution is to break the pipeline apart into separate steps and check error codes appropriately.Wow this is quite the chain of causation …
And it is why I avoid putting any non-trivial shell in YAML or in cron job definitions. I always call out to a real shell script with “unofficial strict mode” at the top (
set -o errexit nounset pipefail
, which is good but not even enough).I have been beating this drum for years … looks like I need to write some blog posts on proper shell error handling and how Oil fixes it. I have mentioned all these issues but it feels like it’s not sinking in.
http://www.oilshell.org/blog/2020/10/osh-features.html#reliable-error-handling
https://lobste.rs/s/iofste/please_stop_writing_shell_scripts#c_mvpkcj – Oil has option groups that make the defaults correct. You don’t have to remember them all
There is also the issue that it’s impossible to handle the errors of process substitution in bash.
Oil has an option
shopt -s process_sub_fail
to fix it, and again it’s onshopt -s oil:basic
.https://www.oilshell.org/blog/2022/01/notes-themes.html#other-shells-have-this-problem
https://news.ycombinator.com/item?id=29848018
Oil:
In contrast, bash will just keep going and ignore failure, regardless of any shell options you set. There is no way to retrieve the exit code with
$?
or PIPESTATUS similar, or fail on it.For reference, here another one I remember from Jane St.
When bash scripts bite (2017): https://news.ycombinator.com/item?id=14321213
So that shows where “strict mode” does NOT solve the problem. In contrast, this post is similarly complex, but strict mode WOULD have solved the problem.
There is another issue with SIGPIPE that Oil addresses, which I have mentioned in the blog, but also deserves a clearer explanation. Something like
yes | head
can cause a “false failure”, whichshopt -s sigpipe_status_ok
in Oil avoids.Sounds like it’s time for another periodic reminder that while
pipefail
can indeed be useful, blindly slapping it on every shell script in sight without consideration of the case-by-case particulars is not necessarily a good idea.Suggest
devops
anddebugging
tags, since this is an outage thing.scaling
is more for “how do we scale something”. :)| xargs -P
;)I still think
rc
hits the nail on the head for shell programming in a way no one else managed:$status
is an array, and all the conditionals in the shell interpret an array of all 0’s as success.Very cool indeed, do you use
rc
as your daily driver?Very much so, yes. After working with byron, I specifically recommend his version of
rc
, it handles pretty much every ergonomic issue correctly: https://github.com/rakitzis/rcI also tend to run something to check that the config file parses properly before reloading it.
It seems like it did parse properly: an empty string is a valid TOML document.
Perhaps “sanity check” is a better term.