1. 7
  1. 10

    From A Scheme Shell, by Olin Shivers:

    Shell programming terrifies me. There is something about writing a simple shell script that is just much, much more unpleasant than writing a simple C program, or a simple CommonLisp program, or a simple Mips assembler program. Is it trying to remember what the rules are for all the different quotes? Is it having to look up the multi-phased interaction between filename expansion, shell variables, quotation, backslashes and alias expansion? Maybe it’s having to subsequently look up which of the twenty or thirty flags I need for my grep, sed, and awk invocations. Maybe it just gets on my nerves that I have to run two complete programs simply to count the number of files in a directory (ls | wc -l), which seems like several orders of magnitude more cycles than was really needed.

    I liked the example in the article, but we can also use it to show the shortcomings of Unix philosophy. Suppose we wanted to roll a 10-million-sided die (I.E., pick an integer between 1 and 10000000). Here’s what that looks like in terms of computing efficiency.

    $ time ( seq 1 10000000 | shuf | head -n1 )
    3574362
    real	0m6.966s
    user	0m6.423s
    sys	0m1.277s
    

    I ran that on my Raspberry Pi because it was a good demonstration. It’s faster on x86 systems: 2.5 seconds on a 2012-era AMD CPU, and 1.4 seconds on an Epyc Rome system from 2020.

    If we wanted to roll a 2**64-sided die, forget about it.

    In practice, what you do in Unix philosophy is write a C program called randrange or similar, adding a new verb to your language.

    Another point: one thing that makes Unix philosophy attractive is the compositionality. Compositionality is what some of us love about Forth (I’m an ex-Forther). The difficulty with both is “noise” that obscures solutions to problems. In Forth, the noise is stack juggling. In Unix, it is all the text tool invocations that massage streams of data.

    1. 20

      On the contrary that one is ultra fast:

      % time env LC_ALL=C tr -c -d '0-9' < /dev/random | head -c 7
      5884526
      env LC_ALL=C tr -c -d '0-9' < /dev/random  0.01s user 0.01s system 95% cpu 0.016 total
      head -c 7  0.00s user 0.00s system 32% cpu 0.009 total
      
      1. 8

        In gnu land:

         time shuf -i 0-10000000 -n1
        3039431
        
        real	0m0.005s
        user	0m0.002s
        sys	0m0.002s
        

        1 command, and it’s super fast.

        -i equald the range to select from and -n equals the number of items to return.

        1. 3

          The shuf(1) can also be installed on FreeBSD from packages.

          On my 10 years old system:

          % time shuf -i 0-10000000 -n1
          1996758
          shuf -i 0-10000000 -n1  1.02s user 0.02s system 99% cpu 1.041 total
          
          % pkg which $( which shuf )
          /usr/local/bin/shuf was installed by package shuf-3.0
          
          1. 2

            Awesome! I didn’t have a BSD machine readily available, and I don’t remember the shuf details, so I didn’t want to claim it would work there. The shuf on the system I used is from GNU coreutils 8.32.

            It seems like the BSD shuf at least in version 3.0, it actually is generating the full 10000000, since it’s taking 1s and 99% CPU.

            The GNU version seems to skip that step, since it takes basically no time. I wonder if newer versions of BSD’s shuf also take that shortcut.

            1. 3

              Seems that is little more complicated :)

              The shuf(1) I used in the above example is from sysutils/shuf package - which is:

              “It is an ISC licensed reimplementation of the shuf(1) utility from GNU coreutils.”

              I also have sysutils/coreutils installed and gshuf(1) from GNU coreutils [1] is a lot faster (like Your example):

              % time gshuf -i 0-10000000 -n1
              8474958
              gshuf -i 0-10000000 -n1  0.00s user 0.00s system 63% cpu 0.005 total
              

              [1] The GNU coreutils on FreeBSD have additional ‘g’ letter in from of them to avoid conflicts - like gshuf(1)/gls(1)/gtr(1)/gsleep(1)/… etc.

              Hope that helps :)

              1. 1

                My question was, if newer versions of

                sysutils/shuf

                also had the ability to skip creating the full range, if only 1 output was requested. At least that’s my assumption on why GNU shuf is 1s faster than the copy from sysutils/shuf

                Otherwise I agree with everything you said, obviously.

                1. 1

                  As I see here - https://github.com/ibara/shuf the sysutils/shuf port is at current 3.0 version.

                  There is no newer 3.1 or CURRENT version of this ISC licensed shuf(1).

                  1. 1

                    Sorry, I apologize. I assumed there was likely a new version since you mentioned:

                    On my 10 years old system:

                    way back up there somewhere.

                    Have an awesome day!

                    1. 1

                      The 10 years old system referred to my oldschool ThinkPad W520 hardware :)

                      The system is ‘brand new’ FreeBSD 13.0-RELEASE :)

                      You too, thanks.

        2. 2

          I do find this interesting but at the same time I think it’s missing the point. I’m sure this comment was not intended to be and actually is not one of those clever “yes but what’s performance like” throwaway comments at meetings, but I wanted to pick up on it anyway.

          One thing that the spectrum of languages has taught me is that there are different jobs and different tools for those jobs. The point that I saw from the example was composability and STDIO pipelining, with an example simple enough not to get in the way of that for newcomers.

          You say “in practice”, directly after having just wondered about a 10-million sided die. Such an object, at least in my experience, is not something you come across in practice. As an ex D&D gamer, anything more than 20 sided is extreme for me and I suspect for most people.

          1. 2

            One thing that the spectrum of languages has taught me is that there are different jobs and different tools for those jobs.

            It’s true.

            The point that I saw from the example was composability and STDIO pipelining, with an example simple enough not to get in the way of that for newcomers.

            Oh no, I didn’t miss the point at all. I wasn’t criticizing the example; I think it is a good one that demonstrates Unix philosophy quite well. I was making a counter-point, that with Unix philosophy, sometimes the specific solution does not generalize.

            Another point worth making is that a solution involving pipes and text isn’t necessarily the correct one. For instance, consider the classic pipeline to count files in a directory: ls |wc -l. I use that all the time. The only reason it almost always gives correct answers is that by custom, nobody puts newlines in filenames, even though it is totally legal.

            mkdir /tmp/emptydir
            cd /tmp/emptydir
            fname="$(printf "foo\nbar")"
            touch "${fname}"
            ls |wc -l
            

            That gives the answer 2. So much for counting files with wc.

            You say “in practice”, directly after having just wondered about a 10-million sided die. Such an object, at least in my experience, is not something you come across in practice.

            It was a whimsical use of metaphor, though maybe God plays D&D with 2**64-sided dice? The problem of picking a random integer in the range 1 to X comes up frequently enough that Python has a function in its standard library for it: random.randrange.

          2. 2

            If we wanted to roll a 2**64-sided die, forget about it.

            $ time env LC_ALL=C tr -cd a-f0-9 < /dev/urandom | head -c 16
            a7bf57051bd94786
            env LC_ALL=C tr -cd a-f0-9 < /dev/urandom  0.00s user 0.00s system 68% cpu 0.012 total
            head -c 16  0.00s user 0.00s system 34% cpu 0.009 total
            
          3. 10

            Somewhat tangential: one of the things that many great historical institutions and/or schools of philosophy have in common is that their decline is linked significantly to a point at which new developments stop happening, and every new piece of work is judged not so much by its value, but by how much it conforms to the norms of said philosophy. At that point, a community stops being “productive” in terms of development – it stops advancing things, it just rummages on the old ones until, finally, there comes a generation that simply doesn’t embrace the rummaging anymore, and all is forgotten.

            There’s been a pretty “deep” undercurrent that seeks to bring the “old” Unix philosophy to light again for many years now (suckless is a famous, but somewhat recent phenomenon). Frankly, I don’t know what to read into it – is this a truly “productive” re-exploration of the old philosophy, as (what we now call) neoplatonism was to platonism, or is it more akin to the dry scholastic treaties of late scholastic philosophy, produced by aging monks who didn’t get the memo on the Renaissance?

            Many years ago (cca. 2011?) when I went down the suckless rabbit hole, I thought it was more like the latter. But in a pretty interesting twist of fate, the decline of the other major platforms in terms of adequacy for professional use is starting to make it look more like the former. Back in 2008 or whatever I really didn’t understand why anyone would ever want to use something like dwm, but after a few years of Gnome 3 and Windows 8, ratpoison looked really good, and I used it for many years.

            By the way – another common trait of these “inflexion” points is that, at some point, founding figures take quasi-mythical proportions and it’s no longer clear who said what :). For example, the article makes this claim:

            Douglas McIlroy summarized the Unix philosophy as follows:

            • Write programs that do one thing and do it well.
            • Write programs to work together.
            • Write programs to handle text streams, because that is a universal interface.

            but this famous form of the summary is not McIlroy’s (whose original formulation, from the famous technical note at Bell Labs, had 4 rules, not 3) but much more recent, it’s the version that Peter Salus gives in his Quarter Century of Unix (at which point, one might argue, pretty much every commercially-successful Unix had departed significantly from this philosophy).

            (Edit: why this tangent: another common trait of these decline phases is that a lot of talk seems to devolve into doctrinary discussion after the “point of no return”. While McIlroy, Ritchie and Thompson obviously shared a lot of common ideas about how one should write programs, it’s unlikely that they ever sat at the table and seriously discussed whether something breaks the Unix philosophy or not, the way people discussed systemd 40 years later. To the people who invented Unix, and wrote much of the first generations of Unix programs, what we now call the Unix philosophy is just how you wrote “proper” programs – the formal rules (and the ability to judge by how programs adhere to them, rather than how efficient they are at what they’re supposed to do) came much later. The same goes here – as someone else has pointed out, “the Unix philosophy” might as well suggest cat-ing from /dev/random, but if you approach this problem from the mindframe of PDP-11 era computing – which overlaps with the Unix philosophy to the point where it’s often hard to say where the Unix philosophy ends and where 1970s computing limits begin – you get the author’s inefficient example).

            1. 12

              Here’s what Doug McIlroy actually wrote [1]:

              (i) Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new “features”

              (ii) Expect the output of every program to become the input to another, as yet unknown, program. Don’t clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don’t insist on interactive input.

              (iii) Design and build software, even operating systems, to be tried early, ideally within weeks. Don’t hesitate to throw away the clumsy parts and rebuild them.

              (iv) Use tools in preference to unskilled help to lighten a programming task, even if you have to detour and build the tools and expect to throw some of them out after you’ve finished using them.

              It’s not quite the summary presented.

              [1] Ah, the joys of finding a physical copy of The Bell System Technical Journal Vol. 57 / PP. 1897-2312 / July-Aug 1976 at a second hand book store for $5.00 about twenty years ago.

              1. 3

                The goal of any tool is to “go away”, ie, disappear from thought as the artist/engineer uses the tool to accomplish a greater goal. To the degree the tool doesn’t go away, progress is limited, ultimately halted.

                This provides a simple metric to address you question: if you’re spending time debating whether something or not uses the Unix Philosophy, you’re not reaching greater goals. Simply take a look at the allocation of time required to reach a goal.

                My money says that as we stack abstraction on top of abstraction, edge cases are requiring us to discuss tools even more than a decade or two ago. This means that wherever we’re headed, it’s not more productivity and creativity.

                Looked at through this metric, it’s obvious that the Unix Philosophy isn’t the answer to anything. It’s simply a nice collection of thoughts that help keep us from piling the BS too deep. Once we make it a goal in itself, we end up back to wasting too much time again.

                Note: this also implies that the metric here isn’t in the tool itself; it might very well change from coder to coder or from team to team.

                1. 1

                  but this famous form of the summary is not McIlroy’s (whose original formulation, from the famous technical note at Bell Labs, had 4 rules, not 3) but much more recent, it’s the version that Peter Salus gives in his Quarter Century of Unix (at which point, one might argue, pretty much every commercially-successful Unix had departed significantly from this philosophy).

                  Hi @x64x, many thanks for your interesting comment and for pointing out the wrong author attribution!

                  1. 2

                    I am definitely the one who ought to be thanking you, for a very interesting article! These three rules (in this form) are very commonly attributed to Doug McIlroy, it’s just a small piece of trivia.

                    1. 1

                      Very glad you liked the article, many thanks again!

                2. 4

                  You can get rid of head -n1 at the end of the pipe by using seq 1 6 | shuf -n1.

                  1. 4

                    I know that’s not the point of the article, but my “Unix” doesn’t have seq or shuf. So i propose jot -r 1 1 6

                    1. 6

                      I’ve found a lot of “Unix philosophy” arguments online rely heavily on GNU tools, which is sort of ironic, given what the acronym “GNU” actually stands for.

                      1. 6

                        The “Unix” in GNU isn’t the ideal of an operating system like Unix (everything’s a file, text-based message passing, built on C etc. etc.), it’s the “Unix” of proprietary, locked-in commercial Unix versions. You know, the situation that forced the creation of the lowest common-denomination POSIX standard. The ones without a working free compiler. The ones which only shipped with ED.

                        1. 5

                          BSD shipped with vi and full source code before the GNU project existed, and by the 1980s there were already several flavors of Unix. But AT&T retained ownership over the name Unix, which is never something that should have happened - it was always used as a genericized trademark, and led to travesties like “*nix”.

                          RMS is a Lisp (and recursive acronyms) guy who never seemed to care much about Unix beyond viewing it as a useful vehicle and a portable-enough OS to be viable into the future (whereas the Lisp Machine ecosystem died). Going with Unix also allowed GNU to slowly replace tools of existing Unix systems one by one, to prove that their system worked. GCC was in many cases technically superior to other compilers available at the time, so it replaced the original C compiler in BSD.

                      2. 2

                        I found jot to be more intuitive than seq and I miss it. Not enough to move everything over to *BSD though.

                        1. 1

                          I’m pretty sure it’s available (installed by default) on Linux systems (depending on distribution).

                          1. 1

                            On my VPS (Ubuntu 20.04 LTS)

                            $ jot
                            Command 'jot' not found, but can be installed with:
                            sudo apt install athena-jot
                            

                            On my RPi 4 (Raspbian GNU/Linux 10 (buster))

                            $ jot
                            -bash: jot: command not found
                            

                            I first learned about it from the book Unix Power Tools, at which time I was running a couple of BSDs, so I kind of got used to it then…

                      3. 2

                        i don’t think that helps with the demonstration of programs that each do “one thing well”.

                        1. 3

                          However it still helps in faster execution as that is one less program in the pipeline. I don’t see a problem with shuf containing the ability to output a certain number of lines as that still feels like it pertains to the subject matter of the program and it is quite useful. At least from what I’ve seen used with shuf it is probably the most used option for it too.

                          1. 1

                            Sure, in practice I wouldn’t pipe cat into grep or whatever. Whatever the purists say, flags are useful. But in a demonstration of how the pipeline works, I think it makes more sense to use one tool to shuffle and another tool to snip the output, than the shuffling tool to snip, that’s all I meant.

                            In practice, I probably wouldn’t be simulating a dice roll in the shell, but if I was, my aim would be to get what I want as fast as possible. To that end, I’d probably use tail instead of head, as that’s what I use most often if I want to see part of a file. I’d probably use sort -R instead of shuf, because I use sort more often. That hasn’t dropped any of the parts of the pipeline, but it also doesn’t represent the “one thing well” spiel because randomizing is kind of the opposite of sorting.

                            I guess that’s what I was getting at :)

                      4. 3

                        Alternative:

                        % env LC_ALL=C tr -c -d '123456' < /dev/random | head -c 1