1. 46
  1.  

  2. 31

    This is on one level pretty straightforward and on another level very arcane. There are two importance pieces of knowledge beyond normal Unix command line stuff needed to understand it: echo is a builtin in modern shells, and when a process writes to a closed pipe it is normally terminated with a SIGPIPE.

    Since echo is a builtin, the (echo red; echo green 1>&2) will all be run by one process. If this process does not start running until after echo blue has exited, it will receive SIGPIPE when it does echo red and die, and not go on to run echo green. So we have three order of execution cases:

    • the shell on the left side gets through both of its echos before echo blue runs (well, before it writes its output). The output you get is ‘green blue’.
    • the echo red happens before echo blue exits, so it doesn’t get a SIGPIPE, but echo green happens afterwards. The output you get is ‘blue green’. This is probably the usual case, especially on a multi-core system where both sides of the pipeline can run at once.
    • the echo blue process runs to completion before the shell on the left side gets a chance to finish echo red, so the left side shell is terminated before it writes ‘green’. The output you get is just ‘blue’.

    If echo wasn’t a builtin the SIGPIPE would only ever terminate the ‘echo red’ process, not the entire shell command sequence on the left side, and so you would normally always see ‘green’ in the output (unless you’ve set the option in the shell to terminate command sequences if one of them has an error). You might still theoretically see it before the ‘blue’, depending on scheduling.

    (I believe that one thing that makes ‘blue green’ a more likely output is that shells usually start the processes in pipelines from left to right, so the echo red; echo green 1>&2 will normally become ready to run ever so slightly before the echo blue.)

    1. 4

      Also, for some extra raciness… isn’t stderr usually unbuffered, while stdout is usually line buffered?

      1. 3

        At least when going to a pipe, stdout seems to be buffered in a 64k buffer.

        This prints 16384 lines for me and then hangs:

        (i=1; while true; do echo red; echo $i 1>&2; ((i+=1)); done) | sleep 10

        I would assume that “red\n” has 4 chars. 16384*4=65536.

        1. 2

          I believe that this is the kernel’s pipe buffer limit in action, not the shell’s. In order to limit how much kernel memory you’re using, the kernel only allows so much data to be written to a pipe before it’s read by the other side. I believe that POSIX requires it be at least 4 KB, but systems are free to allow more than that if they feel like it. A larger kernel buffer size has the advantage that it makes some programs work more reliably.

          (The simple way to write a program that both writes data to a subprocess and then reads back the subprocess’s output is to write all the data first then read all the output back afterwards. Unfortunately this is prone to deadlocks, where the sub-process writes out enough to block until you read while you’re writing enough more input to it to block until it reads. The larger the kernel’s pipe buffer size, the more writing you need on both sides to have this happen.)

        2. 2

          I think that buffering doesn’t matter here. Although the two echos on the left side are being run by the same shell process, the shell is careful to make them behave as if they were separate processes so as part of that it will flush any standard output buffering as part of finishing echo red.

          (There have been bugs here in Bash in peculiar situations, but as far as I know they’re all gone now.)

        3. 2

          On a statistical level, you could assume it’s a Poissonian distribution. Now calculating the expected value (what it should be) would be a lot of effort, so instead you can just run that many times in an actual shell to get enough data to say that a vast majority of the time (~ 3 standard deviations) the answer is “green\nblue\n” and that it is what the output should be. Most of the time.

        4. 13

          This is expected behavior, but you have to know exactly how shells work to understand it!

          Processes started on either side of the pipe are not guaranteed to run in any particular order. The shell just forks two processes with their stdout and stdin hooked together, it doesn’t force the OS to schedule one process before the other. The processes typically do work in the order they’re written because pipelined processes typically depend on the input from the previous process in the pipeline. The echo command does not, i.e. it never reads from stdin.

          1: A process 1 is started for the left side of the pipe. The ‘echo red’ is executed. It is blocked and hangs, waiting for something on the other side to read from stdin.

          No, A subshell process is forked for the left side, and its stdout is redirected to the pipe. That subshell process forks and executes echo red, that output goes into the pipe, and echo red terminates. That subshell process then forks and executes echo green 1>&2, that output goes to the shell’s stderr, and echo green terminates. Neither echo processes block.

          2: A second process 2 is started for the right side of the pipe. The ‘echo blue’ is executed and blue is written to the terminal.

          No, a second subshell is forked for the right side of the pipe. That subshell forks and executes echo blue.

          3: Process 2 ends as “echo blue” finished. The shell lifts the write block for “echo red” and the next command “echo green 1>&2” is executed.

          No, there is no write block, echo blue runs before echo green most of the time because echo blue is executed after 2 forks (subshell, echo), but echo green is executed after 3 forks (subshell, echo, echo), thus the OS usually schedules echo green later. At least, that’s what would happen for most commands. More on this later.

          4: After process 2 ends, the shell sends a KILL signal to process 1. Sometimes this signal arrives before “green” is printed.

          Close.

          Sometimes echo blue will run AND terminate before the OS schedules echo red to run. The OS will then send SIGPIPE to echo red, which is sent to processes that write to a pipe whose read end is closed. It will terminate without writing.

          But then the subshell should fork and execute echo green 1>&2, right? Actually no. To illustrate what would usually happen, I’ve been describing echo as a separate process. But echo is a bash builtin, meaning the shell performs the task itself without creating a separate process. That means the subshell is performing the write, the subshell receives SIGPIPE, and the subshell terminates before starting echo green 1>&2.

          If you force bash to actually fork and execute an echo process, you will never observe the case where only blue is printed. You can force this by calling /bin/echo explicitly, instead of just echo. You can prove this by adding a sleep in the subshell.

          $ (sleep 1; echo red; echo green 1>&2) | echo blue
          blue
          $ (sleep 1; /bin/echo red; echo green 1>&2) | echo blue
          blue
          green
          

          Now it should be clear what I meant by “most commands” before, when describing why echo blue runs before echo green: echo is a builtin. Both echo builtins run after only 1 fork, the fork creating the subshell for their respective pipeline stages. After the fork, the second subshell dups the pipe to stdin, writes “blue” to stdout, and terminates. The first subshell dups the pipe to stdout, writes “red” to stdout (aka the pipe), dups stderr to stdout, then finally writes green to stdout (aka stderr), and terminates. Even though both stages fork the same number of times, the left subshell makes more syscalls, so echo blue still usually runs first.

          1. 9

            Alternative, really short explanation :)

            1. A pipeline runs processes in parallel [1]
            2. There’s no synchronization in this code. The only synchronization is that the shell waits for ALL the processes in a pipeline before executing the next statement. [2]
            3. Therefore the output is nondeterministic.

            That’s it. The burden of proof is on you if you claim this should be deterministic. Nobody ever said it would be :)

            The fact that a bunch of shells were tested and it happens a certain way 99% of the time isn’t relevant!

            [1] The last process may or may not be shell interpreter itself, e.g. shopt -s lastpipe in bash, or zsh default behavior.

            [2] Another source of synchronization can be blocking reads and writes on the fixed size pipe, but that doesn’t apply here.

            1. 4

              That kind of reasoning works until you inherit a project that relies on that 99% behaviour. You make an innocent change and the system breaks, so now it’s your fault and you have to understand why it worked in the first place.

              1. 2

                But that kind of reasoning doesn’t stop working when the system breaks. In fact that kind of reasoning is what explains why it broke, as well as most of why it previously seemed to work. (Or actually worked – it could be the innocent-looking change that removed or weakened the only sequencing point in the system, thereby turning the order from deterministic to random. Again the explanation will require that very kind of reasoning.)

                1. 1

                  Sorry if my argument came across as if I’m discrediting the effort to understand the proper specification. The point is, in most software development settings, there will be two parties: You and a manager whose only insight into the issue is that you made a commit and things broke. So, any kind of quirk in the system is your problem and there’s no one else to put the burden of proof onto.

                  1. 1

                    Yeah, that sort of dysfunction is the motivation for the notion of the blameless postmortem. Working on a system that was built defectively, then put on trial by someone without the tools to judge them: a lose-lose-lose situation for a developer. (Quite possibly why the system was built defectively in the first place. Though that depends on a lot of factors.)

            2. 6

              Using strace, it just appears to come down to who writes first. the left side of the pipe, or the right side. If red writes before blue exits, you’ll either see “blue\ngreen” or “green\nblue\n” and it ultimately comes down to who writes right after red. if blue writes first, then the subshell with red and green gets killed with SIGPIPE when it tries to write “red\n” and you only get “blue\n”

              Replacing the subshell with group command and I never see the blue only output case since the parent runs the red and green echos.

              1. 4

                My attempt at an explanation:

                • First a subshell is started, because of “(”.
                • Then “echo red” is executed, which streams “red” via stout to stdin of the command after the pipe, which is “echo blue”.
                • Since echo does not read from stdin, “red” is ignored, and the only output lined up for stdout is “blue”.
                • “echo green” sends “green” to stderr, and it is not passed over to “echo blue” with the pipe.
                • now “blue” is ready to be flushed on stdout and “green” is ready to be flushed on stderr.
                • The first to win the race will be flushed first, which makes the order appear a bit random.

                The output should usually be “blue green” since “echo blue” does not launch a subshell and is not waiting on stdin.

                It should not normally output just “blue”.

                1. 2

                  The “green blue” “blue green” switch also appears on OpenBSD. I haven’t seen just “blue”, though.

                  1. 0

                    the premise of this article is nonsense

                    “echo” doesnt expect and doesnt accept standard input

                    i get that the point is to test race conditions or some such, but any kind of conclusion they are trying to make is moot as far as im concerned if they cant come up with even one realistic example.

                    1. 6

                      This comic about the mantis shrimp has absolutely no practical impact on my life. But I still find it incredibly interesting, and I’d love to see one in real life (behind a thick pane of glass).

                      I didn’t think the point was to test race conditions. I thought the point was that it was interesting and unusual behavior that I wouldn’t expect and that made me think. Can’t that be enough?

                      (Edit: just want to point out that since I originally viewed this article the author added some theories. But I still stand by my point, honestly.)

                      1. 5

                        So here’s another nonsense thing.

                        fd = open("file");
                        ptr = mmap(fd);
                        write(fd, ptr, 4096);
                        

                        But why would you do that? It’s dumb. It’s not realistic. It doesn’t do anything. It’s ridiculous.

                        It also has a tendency to deadlock certain systems.

                        1. 2

                          That’s interesting, too, then. What does it deadlock, just the user mode process? And which systems? Maybe it could have implications for sandboxed environments like a browser’s javascript engine? I wouldn’t immediately dismiss curious trivia like this offhand.

                        2. 4

                          I think the point isn’t to raise a practical issue so much as to ask “what are the semantics of bash/unix pipes?” From that perspective the example doesn’t need to be realistic, and trying to flesh out their minimal example into something more “real” would probably just obscure the actual question. See cks’s answer: there’s some legitimately important nuance as to why this example behaves as it does.

                          1. 1

                            It points out that other than “try it and see” you can’t necessarily know what will happen in Shell.

                            1. 1

                              It doesn’t say echo reads from stdin, it says it expects its stdout to be read from.

                              1. 0

                                the first example given is

                                (echo red; echo green 1>&2) | echo blue
                                

                                this is ridiculous because you would never use syntax like this ever. firstly you would use “{}” not “()” to avoid the subshell - second you dont pipe to “echo” because thats pointless. it doesnt do anything, as “echo” doesnt accept standard input.

                                1. 5

                                  I disagree, I think it’s interesting despite “|echo” not being a useful construct in itself. It works well as a mechanism for highlighting an interesting interaction with pipes, blocking IO, buffers, signals and forked child processes.

                            2. 1

                              Seems to do the right / expected thing as far as I can reproduce?

                              1. 1

                                I just did this 100 times with bash 5 on MacOS and got the same result every time.

                                green
                                blue
                                
                                1. 6

                                  It’s a tight race; you’ll need to run a lot more trials than that. It would likely also help to try with different levels of ambient system utilisation, to see the effect that coscheduling will inevitably have.