Threads for lhoursquentin

  1. 14

    I’ve never seen the bash pipefail option before, and what I’ve been able to read and try does not line up with what is in the blog post. Can someone clarify this for me?

    As I understand it, pipefail is about setting the exit status of the overall pipeline:

    $ false | true
    $ echo $?
    0
    $ set -o pipefail
    $ false | true
    $ echo $?
    1
    $ 
    

    But now if I do

    $ set -o pipefail
    $ false | sleep 2
    $ 
    

    That command runs for two seconds. In particular, the sleep does not seem to have been interrupted or have any indication that false failed. So if the problem was the command

    dos-make-addr-conf | dosctl set template_vars -
    

    Then yes, pipefail is going to make that shell script exit 1 now instead of exiting 0. But I don’t see what stops dosctl set template_vars - from taking the empty output from dos-make-addr-conf and stuffing it into template_vars. Is the whole shell script running in some kind of transaction, such that the exit value from the shell script prevents the writes from hitting production?

    Thanks for any clarifications. (I agree with the general rule here about never using shell to do these things in the first place, pipefail or not!)

    1. 14

      You’re absolutely right, pipefail is only about the return value of the entire pipeline and nothing else.

      From the article:

      Enabling this option [pipefail] changes the shell’s behavior so that, when any command in a pipeline series fails, the entire pipeline stops processing.

      Nope, wrong, nothing stops earlier.

      1. 5

        Author here – good catch! I tried to golf the example down to a one-liner for clarity, but it looks like I need to update the blog.

        Indeed, as @enpo mentioned in a sibling post, -e is also critical, and a more accurate reproduction would be something like…

        cat unformatted.json | jq . > formatted.json
        

        If unformatted.json does not exist, then, without -e and -o pipefail, you will clobber formatted.json.

        1. 8

          If unformatted.json does not exist, then, without -e and -o pipefail, you will clobber formatted.json.

          Even with errexit and pipefail, you will still clobber formatted.json

          $ bash -c '
           > set -o pipefail
           > set -o errexit
           > printf '{}\n' >formatted.json
           > cat unformatted.json | jq . >formatted.json
           > '
          cat: unformatted.json: No such file or directory
          $ cat formatted.json
          

          This is because bash starts each part of a pipeline in a subshell, and then waits for each part to finish.

          Each command in a pipeline is executed as a separate process (i.e., in a subshell).

          And before running the commands in the subshells bash handles the redirections, so formatted.json is truncated immediately, before the commands are run, which is why you get behavior like:

          $ cp /etc/motd .
          $ wc -l motd
          7 motd
          $ cat motd | wc -l > motd
          $ cat motd
          0
          
          
          1. 7

            Sigh. I’ve updated the post with a new (hopefully correct) contrived example:

            (dos-make-addr-conf | tee config.toml) && dosctl set template_vars config.toml
            
            1. 7

              Hm, no lobsters acknowledgments in the post? Kinda shame.

              1. 4

                This does have the desired effect of not running dosctl if dos-make-addr-conf fails, but it is a bit hard to read. Why are you using tee, do you want the config to go to stdout as well? One way to make the control flow easier to read is to use if/else/fi:

                if dos-make-addr-conf >config.toml; then
                    dosctl set template_vars config.toml
                else
                    printf 'Unable to create config.toml\n' >&2
                    exit 1
                fi
                

                This way your intentions are clearer and you don’t even need to rely on pipefail being set.

          2. 4

            The article focused on set -o pipefail, but the fix presented also had set -e. According to the documentation, this makes all the difference.

            The article should probably have been more clear in that regard.

            1. 2

              I took the theory for a test drive, and @lollipopman is entirely correct. Today I learned something new about shell scripting :)

              $ cat failtest.bash
              #!/bin/bash
              set -euo pipefail
              
              false | sleep 2
              
              $ time ./failtest.bash
              real    0m2.010s
              user    0m0.000s
              sys     0m0.008s
              
            2. 2

              Just to get it out of the way first, pipefail in itself won’t stop the script from proceeding, so it only makes sense together with errexit or errtrace or an explicit check (à la if false | true; then …). As you say, it’s about the status of the pipeline.

              man 1 bash:

              If pipefail is set, the pipeline’s return status is the value of the rightmost command to exit with a non-zero status

              But you seem to be right: Pipefail doesn’t propagate the error across the pipeline. Which isn’t surprising given the description above. Firstly, there is of course no waiting for each other’s exit statuses, because the processes in the pipeline are run concurrently. Secondly, it doesn’t kill the other processes either. Not as much as a sighup – your sleep command would evidently die if it got a sighup.

              1. 1

                @rsc, I am pretty sure you are correct that setting pipefail, errexit, or nounset has no effect on whether dosctl set template_vars - is run as part of the pipeline. Bash starts all the parts of the pipeline asynchronously, so whether dos-make-addr-conf produces an error or not, even with pipefail, has no effect on whether dosctl is run. I believe the correct solution is to break the pipeline apart into separate steps and check error codes appropriately.

              1. 5

                FWIW I stopped using BRE in favor of ERE because I want to remember fewer things, and I can always use:

                1. awk – ERE by default
                2. egrep – opt in to ERE with one char :) (I also sometimes use fgrep, so it’s funny that plain old grep is the thing I do NOT reach for)
                3. sed –regexp-extended on systems with GNU sed.

                I think #1 and #2 are POSIX, so all systems should have them. The third isn’t, but at this point I would just install/compile GNU sed on those systems if I encountered them.

                FWIW this is why Eggex [1] only compiles to ERE at the moment. But BRE can be added if anyone think it’s worth it and is motivated :)

                [1] https://www.oilshell.org/release/latest/doc/eggex.html

                1. 3

                  I think #1 and #2 are POSIX, so all systems should have them

                  Took a quick look and it’s not very clear but it looks like egrep and fgrep as separate binaries/scripts/symlinks are deprecated, the -E and -F options however are indeed specified by POSIX.

                  Looking at my current system (ubuntu) those are just shell scripts calling exec grep -E/F "$@" right now.

                  POSIX spec about fgrep/egrep [1]:

                  This grep has been enhanced in an upwards-compatible way to provide the exact functionality of the historical egrep and fgrep commands as well. It was the clear intention of the standard developers to consolidate the three greps into a single command.

                  The old egrep and fgrep commands are likely to be supported for many years to come as implementation extensions, allowing historical applications to operate unmodified.

                  And GNU grep manual [2]:

                  Direct invocation as either egrep or fgrep is deprecated, but is provided to allow historical applications that rely on them to run unmodified.

                  [1] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/grep.html#tag_20_55_18

                  [2] https://www.gnu.org/software/grep/manual/html_node/grep-Programs.html

                  1. 1

                    Hm thanks for the info! It is good to know what’s POSIX, and I didn’t know they were deprecated.

                    But long term I hope Oil can gain some lightweight containers / app bundle support, and people may worry less about what exact binaries are installed! I guess you can even do something like

                    if ! command -v egrep; then
                      egrep () { grep -E "$@"; }
                    fi
                    

                    in any script. Also I just checked busybox and it supports egrep and fgrep. I think that those shortcuts are trivial and useful and so they’re probably here to stay.

                  2. 2

                    I’m strongly in the “never use BREs” camp and view the existence of POSIX BREs as a bug that is baked into too many places to fix now.

                    1. 1

                      I use BRE by default and use ERE only when needed. Easier for me since I use only GNU versions (no feature difference, only syntax changes for certain metacharacters).

                      Regarding sed, I think there are other implementations which support -E option as well (this used to be -r but now -E is more portable).

                      1. 2

                        If you’re not concerned about sed, is there any reason to use BRE at all?

                        ERE is also closer to Perl-style regexes, so I think it’s easier to remember. I definitely have to remember 2 regex dialects: ERE for awk and Perl-style for Python/C++/etc.

                        I don’t want to remember 3! OK eggex makes it 3, but it compiles to ERE, so maybe one day I can forget it :)

                        1. 2

                          is there any reason to use BRE at all?

                          Just a preference for me, because I often have to match () or {} or +| literally, so default BRE works out better in terms of escapes needed (sometimes none at all compared to ERE).

                          Using ERE always does help you to stay sane across multiple languages and tools.

                    1. 5

                      Nice to see some comparison between those 3, seeing how often they are used together.

                      Small note on the “sed flags for regexp addressing”, “I” is a regex modifier, whereas “e”, “p”, “w”, “d” etc. are commands, pretty much like “s” is, meaning they can be used independently whereas “I” cannot, they work pretty differently.

                      For instance /foo/Id means run “d” command if pattern space matches foo with the case insensitive modifier, we can see this as /foo/I{ d; }. Whereas /foo/dI is a syntax error since it would mean /foo/{ dI; } which is incorrect since the “d” command does not accept an “I” parameter (sed: -e expression #1, char 9: extra characters after command)

                      1. 2

                        Thanks.

                        Regarding your note, the sed manual terms them as flags when explaining their behavior in s command. But, my post is about regexp and these commands do not change the behavior of regexp, so I’ve now removed them.

                      1. 2

                        I found strace incredibly helpful to understand how shells work under the hood, it can highlight why something like ls | read var; echo "$var" produces different results between bash and zsh for instance (last pipeline member forked in bash).

                        1. 1

                          Here’s my very elegant and short take in Sed:

                          h
                          : next_char
                          s/^/|\
                          /
                          t inc_start
                          : inc_start
                          s/^\([0-9]*\)0|/\11/; t inc_end
                          s/^\([0-9]*\)1|/\12/; t inc_end
                          s/^\([0-9]*\)2|/\13/; t inc_end
                          s/^\([0-9]*\)3|/\14/; t inc_end
                          s/^\([0-9]*\)4|/\15/; t inc_end
                          s/^\([0-9]*\)5|/\16/; t inc_end
                          s/^\([0-9]*\)6|/\17/; t inc_end
                          s/^\([0-9]*\)7|/\18/; t inc_end
                          s/^\([0-9]*\)8|/\19/; t inc_end
                          : inc_loop
                          s/^\([0-9]*\)9|/\1|0/
                          t inc_loop
                          s/^|/1/; t inc_end
                          b inc_start
                          : inc_end
                          s/\n\(.\)\1/|\
                          \1/
                          t inc_start
                          P
                          s/^[^[:space:]]*.//
                          s/\(.\).*/\1/p
                          g
                          s/\(.\)\1*//
                          /^$/q
                          h
                          t next_char
                          

                          Result:

                          sh$ echo aaaabbbcca | sed -f my-take.sed
                          4
                          a
                          3
                          b
                          2
                          c
                          1
                          a
                          
                          1. 3

                            I’ve never understood the benefits of adding git aliases instead of (shorter) aliases for your shell. I’ve got plenty of bash aliases like ‘gc’ (git commit - v) instead.

                            Some non-obvious, git related aliases:

                            Show git log with stat and press n and p in less, to jump to next and previous commits:

                            alias gl="git -c core.pager='less -p^commit.*$' log -p -M -w --stat --pretty=fuller --show-notes"
                            

                            A colored alternative to git log --oneline that works well with merge commits, due to --graph:

                            alias glp='git log --pretty="format:%Cred%h %Cblue%d %Cgreen%s %Creset%an %ar" --graph'
                            
                            # create an easy target to compare/rollback to after difficult rebase/merge
                            # etc and the reflog contains too many similar entries
                            alias gp="git tag -d prebase; git tag prebase; git log -n1 prebase"
                            

                            Completion for my aliases (branch/ref names for gl as if I’d written git log):

                            . /usr/share/bash-completion/completions/git
                            __git_complete gco _git_checkout
                            __git_complete gl _git_log
                            

                            Switch between the current and previously checked out branch: git checkout -

                            I always include -M (try to detect moved files) and -w (skip whitespace changes) when using git diff.

                            Complete list of git aliases and functions: https://github.com/chelmertz/dotfiles/blob/e1442914e278db8d237ff06c9c5cf3c31f6bac56/.bashrc

                            1. 11

                              Using git builtin aliases also give you a namespace.

                              For example pressing g<tab> on my system shows 242 possibilities, using git aliases makes sure I don’t have any conflict with existing and upcoming commands on my system.

                              If you’re worried about keystrokes I’d rather have a shell alias to have git availabe as g, and then use git aliases on top, so g l, g lp etc.

                              That’s of course just a personal preference :)

                              1. 7

                                The benefits are for people who think in scopes, missing a better description, I guess. I’ve tried to use “pure shell” git aliases but I found myself not using them and at one point forgetting I had them. When I start typing git then what comes next is a git subcommand. Could be false attribution though, but it helps me memorize stuff.

                                Also lower chance of any conflicts and not spamming the global scope. So many people do “git st” for status, to a point where I wonder why it’s not a default.

                                1. 2

                                  I’ve never understood the benefits of adding git aliases instead of (shorter) aliases for your shell.

                                  Things like __git_complete gl _git_log is not needed as the git completion handles the aliases transparently.

                                  1. 1

                                    That’s a good point, but setting it up is a one time cost. Can you imagine how many keystrokes I’ve saved? :)

                                    I guess not everyone focuses emacs with F1 and Firefox with F2 etc, either, without modifiers. This might be a thing in the same vein.

                                    1. 2

                                      If you change shells the modifications doesn’t carry over. Meanwhile my gitconfig is self-contained and doesn’t care. I’m more in the camp of using the built-in functionality unless there is a clear win with the shell aliasing. I don’t see it here. I can also depend on the git completion telling me if I forget an alias instead of grepping over alias to find it.

                                      1. 1

                                        I neither change shells or forget the aliases, or change from using git to hg (the latter could maybe be abstracted away with aliases/scripts if you’re changing tools often, as your comment implies :)). Thanks for the insights though, now I get why git aliases are being used!

                                1. 7

                                  There is 100% compliant vs 100% restricted to what is stated in POSIX. As far as I know the latter does not exist, and it cannot because there isn’t enough specified by POSIX, at some point you need to take some decisions on what is not specified and other implementations might take the opposite direction.

                                  For example, how do you decide if “echo foo | read var” should fork two processes or only one? After this line the value of var is different between bash and zsh because zsh decides to not fork the last member of the pipeline.

                                  Even though there’s nothing outside of the scope of POSIX here, we still have a completely different result.

                                  Sure with shellcheck you can avoid most of those pitfalls, but if your goal is to ensure portability, I think it is much easier to only allow a specific shell to run your script, even if it creates a dependency.

                                  1. 2

                                    Recently I was thinking if it’d be feasible to compile custom matching routine and plug it into grep with LD_PRELOAD, so I fould this repository interesting and gave it a star :)

                                    1. 1

                                      Now I’m wondering if it would be possible to build a modular tool that will spit out the minimum source code needed for the given command so that it could be compiled separately and possibly provide speed benefit.

                                      1. 2

                                        Do you mean something like re2c or libfsm?

                                        1. 1

                                          Haven’t heard of them before. Glanced through them and I feel like they could be used to create what I’m thinking of but not quite yet.

                                          Instead of translating the script like sed-to-C being discussed in this thread, the grep or sed implementation itself could provide the source code. For example, something like grep --give-source 'foo.*bar' gives source code that can be compiled and then run against the input to get lines matching the given regexp. I don’t have an idea of how complicated these tools are, so this may be too difficult to implement.

                                          1. 2

                                            Slightly unrelated but reminds me a bit of a bunch of tools that can output their bash completion code, npm completion for instance.

                                    1. 2

                                      Sounds awesome! Too bad GitHub is having trouble right now, I’d love to take a look at this.

                                      1. 2

                                        Author here, glad to hear that!

                                        Sed is pretty fun to tinker with, I’d love to see people to get more familiar with it.