1. 23
  1.  

    1. 26

      It just shows how far gone I am that nothing about this seemed obscure. (Don’t let me write documentation by myself.)

      1. 3

        Honestly, it can’t be more obvious, and it could be worse.

    2. 4

      It’s hard to see any simple change in the Bourne shell that could avoid this error, because each of the individual parts are sensible in isolation.

      It is possible to always use env instead of the special syntax to set an environment variable in front of a command. It’s easier on the eyes and the parser alike, because you don’t need to scan forward to see if it is an assignment or command. Env could have been a builtin instead, and made this syntax pointless. Not to mention that Env can take endless more features, like running the command from a different directory and so on.

      I think the lesson is this: Never solve with syntax what you can solve with one more normal thing. In the case of shell syntax, that would be a command.

      1. 5

        It’s not always possible to use env because temporary assignment works for other builtins, which can then interact with the current shell’s environment.

        You need two builtins, one for env and one for assigning to the local shell’s parameters. Some shells are like this and it’s not obvious to me that the experience is better overall. Especially since they still have to be a bit special (if you want to assign to an array, say).

    3. 3

      cool! I’ve never really seen a deep-dive of why these idiosyncracies exist, before.

      1. 8

        I think it’s basically because shell didn’t start off as a programming language, and you can especially see that around variables. Shell basically started in the early 70’s like this:

        ls /tmp | wc -l
        

        It was commands and pipelines.

        Now how do you add assignments? How about do it like Python:

        x = 42
        

        Well that doesn’t work, because what if it’s

        ls = 42
        

        That’s already a command with 2 arguments: the string = and the string 42. This syntax would break too much – we’ve already parsed arbitrary, unquoted strings as commands for years. There is no “syntax error hole” to upgrade the language.

        So let’s just put a little hack in the parser, and say that the first word is an assignment if it has an =

        ls=42   # this is an assignment
        

        And now how do you bind an env variable for a command? Well you can kinda reuse the same cheesy parser:

        ls=42 my-command
        

        So:

        1. I split my command into words
        2. some of them are key=value words
          • if there’s just one of them, then it’s an assignment statement
          • if there’s a leading key=value, then interpret the rest as a command, run with those env bindings

        There are even more hacks added later with:

        local a=$x  
        

        local is a special word, with no word splitting on unquoted $x, unlike in rest of shell. But this is not special, there is word splitting

        env a=$x
        

        Also, env is external, and local is builtin, but there’s no syntactic indication of that. It’s a mess.

        The hacks continue to pile up:

        a[i++]=42  # array assignment
        a[ i++ ]=42  # weirdly, spaces are accepted in bash
        

        I think it helps to view this from the perspective of: parsing in C is extremely difficult. There is no string type in C.

        So all shell implementers, from Thompson to Bourne to Korn to bash, basically took a lot of shortcuts. They created a syntax that was easy to implement in C.

        There are some other places you can see this egregiously in bash:

        read -t 1.1  # read with timeout 
        read -t 0.0  # select() on file descriptor.  It does NOT read !!!
        

        There are no function calls in bash, like strftime() or quoteShellString, so you can just hide functionality inside printf

        printf %q "$str"  # extremely useful shell quoting that is hidden
        printf %b "$str"  # extremely useful shell dequoting that is hidden
        

        And

        printf '%T(Y-m-d)' 12345    # useful strftime that is hidden
        printf '%T(Y-m-d)' -1    # -1 is a hidden current time
        printf '%T(Y-m-d)' -2    # time the shell was invoked
        

        And

        printf -v myvar '%f' 3.14   # assign to variable
        
        myvar=$(printf '%f' 3.14)  # -v is a way to avoid this subshell
        

        In a real language you would write these like:

        shellQuote()       
        shellEval()
        
        strftime('Y-m-d', 12345)
        strftime('Y-m-d', time())
        strftime('Y-m-d', shellStartTime())
        

        And

        myvar = '%f' % 3.14
        

        https://www.gnu.org/software/bash/manual/bash.html

        So basically, shell syntax has “topped out” in C. But they need more features, like formatting strings, and assigning a formatted string without a command sub.

        So they just add a bunch of functionality without regard to whether it’s a readable or intuitive syntax. They just add magic -1 and -2 and 0.0 values, because you can hack that in quickly.


        https://www.oilshell.org fixes this – we have real parsing, real expressions with functions, and structured data.

        But ls /tmp | wc -l still works exactly the same way!

        1. 1

          Hm… you call %q, %b, and %T “hidden”, yet they’re plainly visible in the code. Maybe they’re obscure, in that their names don’t spell out clearly what they do and one must learn what the single-letter codes mean — but that’s like any other printf format specifier, and, once one looks them up, %q for “quote” and %T for “time” seem just as mnemonic as %s for “string” (%b for “unquote”, not so much).

          I could understand opposing printf-style formatting altogether, in favor of using long, obvious names,¹ but

          1. I don’t think strf-anything is an obvious name (one would need to learn that it stands for “string format”);
          2. I don’t think strftime('Y-m-d', 12345) is any less cryptic than printf '%T(Y-m-d)' 12345 — here the Y-m-d makes it fairly clear what this is for, rather than the cryptic printf '%T' or the (IMO even more cryptic) strftime; and
          3. you aren’t opposing printf-style formatting altogether; you explicitly support %f — what makes %f any better than %q?

          ¹ Some would say this is over-optimizing for beginners. I don’t know whether I agree or disagree with that.

          1. 2

            %b for “unquote”, not so much

            I think it’s meant to be an inverted (upside-down) %q.

          2. 1

            Sure, you could say obscure, rather than hidden. I’ll also clarify that the syntax is non-orthogonal. Think of these 2 parts:

            1. What transformation do you want to perform – say shell quoting, or upper case, or convert an integer timestamp to a string ?
            2. Where do you want it to go – assign it to a variable, or print to stdout?

            These two things should be independent, but in bash, you have to mix and match to find the right combo.

            The %q and %b don’t belong in printf, because then you need the -v syntax to assign to a variable. Bash even added a thing later to fix this:

            myvar=${s@Q}  # this syntax is consistent
            
            printf -v myvar '%q' "$s"  # not consistent
            

            This shows you that it’s not orthogonal – the % should not perform formatting; they should be type specifiers.

            printf -v myvar '%s' "${s@Q}"
            

            A quoted string is still a string, so it should use %s.

            If you don’t agree, then why not put upper case inside printf as well ?

            printf -v myvar '%u' "$string_to_upper"
            

            Weirdly there seems to be an equivalent of %q but not %b - https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html

            IIRC the ${x@Q} and %q use slightly different algorithms. Bourne shell already had problems, and Korn shell and bash both made it worse.

            1. 1

              Taking this further, it seems clear that printf is completely superflous in shell. It only makes sense when you have different static data types, as in C, and you need to convert them to strings.

              For shell you could do everything like this

              echo "quoted ${x@Q} upper ${y@U} ${z %.3f}"
              

              and printf would be obsolete. printf is basically a dynamically-parsed string formatting language, but shell already has a statically-parsed string formatting language – double quotes.

              1. 1

                But echo has flag portability issues, and means it doesn’t deal well with --prefixed values. Here is a (strawman) example:

                var="-n"
                echo "$var" # Outputs nothing, or "-n\n" depending on the shell
                printf '%s\n' "$var" # Outputs "-n\n"
                
                echo -n "test" # Not portable across shells
                printf 'test' # Completely portable
                
                1. 2

                  Yes that’s true, I’m talking from the perspective of if shell was designed consistently and had not grown organically :)

                  Oils has a write builtin so you can just do

                  write -- "quoted ${x}"
                  

                  and you won’t need printf when we fold that into ${}


                  Also echo -n test is POSIX, thus portable to any reasonable shell (if you can find one that doesn’t support it, I’ll be surprised)

                  echo $var is what’s problematic – the problem is that you can never echo a string called -n o r-e

                  The problem is really the lack of -- in echo, not portability. The incomplete behavior of echo is absolutely portable, and is what leads people to use printf instead :)

                  1. 2

                    Minor nitpick: the POSIX documentation says -n is implementation-defined.

                    Otherwise I agree with your assessment, it is somewhat unfortunate that we have non-orthogonal design in the shell.

    4. 2

      This is a little confusing because they’re using … as a placeholder for some generic command. Also, sometimes it’s three dots and sometimes it’s four. I’d replace all those instances with something like myprog.

    5. 2

      Love these deep dives, I enjoy how they educate me, but also how they inadvertently celebrate the minutiae of something that many of us think wonderful, ancient and yet still very relevant today.