1. 64
  1. 14

    I’ve never seen the bash pipefail option before, and what I’ve been able to read and try does not line up with what is in the blog post. Can someone clarify this for me?

    As I understand it, pipefail is about setting the exit status of the overall pipeline:

    $ false | true
    $ echo $?
    0
    $ set -o pipefail
    $ false | true
    $ echo $?
    1
    $ 
    

    But now if I do

    $ set -o pipefail
    $ false | sleep 2
    $ 
    

    That command runs for two seconds. In particular, the sleep does not seem to have been interrupted or have any indication that false failed. So if the problem was the command

    dos-make-addr-conf | dosctl set template_vars -
    

    Then yes, pipefail is going to make that shell script exit 1 now instead of exiting 0. But I don’t see what stops dosctl set template_vars - from taking the empty output from dos-make-addr-conf and stuffing it into template_vars. Is the whole shell script running in some kind of transaction, such that the exit value from the shell script prevents the writes from hitting production?

    Thanks for any clarifications. (I agree with the general rule here about never using shell to do these things in the first place, pipefail or not!)

    1. 14

      You’re absolutely right, pipefail is only about the return value of the entire pipeline and nothing else.

      From the article:

      Enabling this option [pipefail] changes the shell’s behavior so that, when any command in a pipeline series fails, the entire pipeline stops processing.

      Nope, wrong, nothing stops earlier.

      1. 5

        Author here – good catch! I tried to golf the example down to a one-liner for clarity, but it looks like I need to update the blog.

        Indeed, as @enpo mentioned in a sibling post, -e is also critical, and a more accurate reproduction would be something like…

        cat unformatted.json | jq . > formatted.json
        

        If unformatted.json does not exist, then, without -e and -o pipefail, you will clobber formatted.json.

        1. 8

          If unformatted.json does not exist, then, without -e and -o pipefail, you will clobber formatted.json.

          Even with errexit and pipefail, you will still clobber formatted.json

          $ bash -c '
           > set -o pipefail
           > set -o errexit
           > printf '{}\n' >formatted.json
           > cat unformatted.json | jq . >formatted.json
           > '
          cat: unformatted.json: No such file or directory
          $ cat formatted.json
          

          This is because bash starts each part of a pipeline in a subshell, and then waits for each part to finish.

          Each command in a pipeline is executed as a separate process (i.e., in a subshell).

          And before running the commands in the subshells bash handles the redirections, so formatted.json is truncated immediately, before the commands are run, which is why you get behavior like:

          $ cp /etc/motd .
          $ wc -l motd
          7 motd
          $ cat motd | wc -l > motd
          $ cat motd
          0
          
          
          1. 7

            Sigh. I’ve updated the post with a new (hopefully correct) contrived example:

            (dos-make-addr-conf | tee config.toml) && dosctl set template_vars config.toml
            
            1. 7

              Hm, no lobsters acknowledgments in the post? Kinda shame.

              1. 4

                This does have the desired effect of not running dosctl if dos-make-addr-conf fails, but it is a bit hard to read. Why are you using tee, do you want the config to go to stdout as well? One way to make the control flow easier to read is to use if/else/fi:

                if dos-make-addr-conf >config.toml; then
                    dosctl set template_vars config.toml
                else
                    printf 'Unable to create config.toml\n' >&2
                    exit 1
                fi
                

                This way your intentions are clearer and you don’t even need to rely on pipefail being set.

          2. 4

            The article focused on set -o pipefail, but the fix presented also had set -e. According to the documentation, this makes all the difference.

            The article should probably have been more clear in that regard.

            1. 2

              I took the theory for a test drive, and @lollipopman is entirely correct. Today I learned something new about shell scripting :)

              $ cat failtest.bash
              #!/bin/bash
              set -euo pipefail
              
              false | sleep 2
              
              $ time ./failtest.bash
              real    0m2.010s
              user    0m0.000s
              sys     0m0.008s
              
            2. 2

              Just to get it out of the way first, pipefail in itself won’t stop the script from proceeding, so it only makes sense together with errexit or errtrace or an explicit check (à la if false | true; then …). As you say, it’s about the status of the pipeline.

              man 1 bash:

              If pipefail is set, the pipeline’s return status is the value of the rightmost command to exit with a non-zero status

              But you seem to be right: Pipefail doesn’t propagate the error across the pipeline. Which isn’t surprising given the description above. Firstly, there is of course no waiting for each other’s exit statuses, because the processes in the pipeline are run concurrently. Secondly, it doesn’t kill the other processes either. Not as much as a sighup – your sleep command would evidently die if it got a sighup.

              1. 1

                @rsc, I am pretty sure you are correct that setting pipefail, errexit, or nounset has no effect on whether dosctl set template_vars - is run as part of the pipeline. Bash starts all the parts of the pipeline asynchronously, so whether dos-make-addr-conf produces an error or not, even with pipefail, has no effect on whether dosctl is run. I believe the correct solution is to break the pipeline apart into separate steps and check error codes appropriately.

              2. 19

                Wow this is quite the chain of causation …

                And it is why I avoid putting any non-trivial shell in YAML or in cron job definitions. I always call out to a real shell script with “unofficial strict mode” at the top (set -o errexit nounset pipefail, which is good but not even enough).

                I have been beating this drum for years … looks like I need to write some blog posts on proper shell error handling and how Oil fixes it. I have mentioned all these issues but it feels like it’s not sinking in.

                http://www.oilshell.org/blog/2020/10/osh-features.html#reliable-error-handling

                The simple invariant is that OSH never loses an exit code, which is not true of POSIX shell or bash.

                https://lobste.rs/s/iofste/please_stop_writing_shell_scripts#c_mvpkcj – Oil has option groups that make the defaults correct. You don’t have to remember them all


                There is also the issue that it’s impossible to handle the errors of process substitution in bash.

                Oil has an option shopt -s process_sub_fail to fix it, and again it’s on shopt -s oil:basic.

                https://www.oilshell.org/blog/2022/01/notes-themes.html#other-shells-have-this-problem

                https://news.ycombinator.com/item?id=29848018

                Oil:

                diff <(sort left) <(sort nonexistent)
                                  ^~
                    [ -c flag ]:1: fatal: Exiting with status 2 (command in PID 29359)
                
                

                In contrast, bash will just keep going and ignore failure, regardless of any shell options you set. There is no way to retrieve the exit code with $? or PIPESTATUS similar, or fail on it.

                1. 6

                  For reference, here another one I remember from Jane St.

                  When bash scripts bite (2017): https://news.ycombinator.com/item?id=14321213

                  So that shows where “strict mode” does NOT solve the problem. In contrast, this post is similarly complex, but strict mode WOULD have solved the problem.

                  There is another issue with SIGPIPE that Oil addresses, which I have mentioned in the blog, but also deserves a clearer explanation. Something like yes | head can cause a “false failure”, which shopt -s sigpipe_status_ok in Oil avoids.

                2. 8

                  it’s widely recommended as best practice that all scripts should start by enabling this

                  Sounds like it’s time for another periodic reminder that while pipefail can indeed be useful, blindly slapping it on every shell script in sight without consideration of the case-by-case particulars is not necessarily a good idea.

                  1. 5

                    Suggest devops and debugging tags, since this is an outage thing. scaling is more for “how do we scale something”. :)

                    1. 2

                      | xargs -P ;)

                    2. 4

                      I still think rc hits the nail on the head for shell programming in a way no one else managed:

                      ; false | true
                      ; echo $status
                      1 0
                      ; true | false
                      ; echo $status
                      0 1
                      ; man rc
                      ; if (true | false) { echo succeed } else { echo fail }
                      fail
                      ; if (false | true) { echo succeed } else { echo fail }
                      fail
                      ; if (true | true) { echo succeed } else { echo fail }
                      succeed
                      ;
                      

                      $status is an array, and all the conditionals in the shell interpret an array of all 0’s as success.

                      1. 2

                        Very cool indeed, do you use rc as your daily driver?

                        1. 1

                          Very much so, yes. After working with byron, I specifically recommend his version of rc, it handles pretty much every ergonomic issue correctly: https://github.com/rakitzis/rc

                      2. 1

                        I also tend to run something to check that the config file parses properly before reloading it.

                        1. 1

                          It seems like it did parse properly: an empty string is a valid TOML document.

                          1. 1

                            Perhaps “sanity check” is a better term.