Sometimes I read something like this and think ‘well yeah, obviously nobody is actually saying to take any advice they give to the most extreme possible point, use your judgement’. But then I remember all the code I’ve read (and this seems most common in Ruby for some reason) where people have literally factored out every single function until they’re almost all exactly 1 line long. And the code where they have written functions with four boolean arguments, used in half a dozen places with two combinations of boolean parameters. And the code that’s been hacked and hacked and hacked and hacked together to form a 5000-line shell script when they could have achieved the same result with a few hours and 200 lines of Python or something.
The traditional UNIX command line is a showcase of small components that do exactly one function, and it can be a challenge to discover which one you need and in which way to hold it to get the job done. Piping things into awk ‘{print $2}’ is almost a rite of passage.
I find this an interesting example if only because I think the Unix command line is a good example of how to do it right, because even if you don’t remember the command to use you can always just emulate most of the other commands with awk. And the general style leads to some really lovely software like gvpr, which I discovered yesterday.
Sometimes I read something like this and think ‘well yeah, obviously nobody is actually saying to take any advice they give to the most extreme possible point, use your judgement.
In other words, This means you’re not the audience—this is really aimed at those building the intuitions.
As you explain, the problem is that we don’t often show good judgement. It’s only after knowing the consequences that we tend to take action. From beginners, i’ve often been asked how and when and where to apply things. The problem is, it’s contextual, and I was hoping to try and give that context.
Rather than examining things through re-use, I wanted them to think about coupling. Instead of thinking about modules as collecting like features, as keeping them apart, and the whole ‘rewrites means migrations’ thing too.
I find this an interesting example if only because I think the Unix command line is a good example of how to do it right
Yes, and no. I mean, I thought the UNIX philosophy was a good idea until I realised how much git demonstrates it. Using flat files, small commands bolted together, fast c parts tied together with bash. It even has the unix thing where each file format or pipe output ends up being a unique mini language inside the program, too. It’s still awful to use.
It’s a good way to build an environment but, well, every command takes slightly different arguments, and things like autocomplete don’t come from inspection or understanding the protocol, and we’re still emulating vt100 terminals. There are good ideas but UNIX demonstrates their discovery more than their application.
On the other hand, plan9 demonstrates them quite well, and some of the problems too. It’s still not exactly pleasant to use, although wonderfully extensible. Plan9 leverages a consistent interface in more ways than UNIX did, exposing every service as a filesystem.
The notion of a uniform interface is also seen in HTTP, and for what it’s worth, how clients on plan9 move from one file to another is very reminiscent of following hypertext in a browser. There are good ideas in UNIX, but there are better examples of them.
Awk isn’t one of them, I mean, Awk’s great but it is one of the things, like tcl and bash, and perl that marked the end of ‘do one thing and do it well’, they were glue languages that grew features. Even bash 4 has associative arrays now.
UNIX has grep and egrep and ripgrep and at least three distinct types of regular expressions in common use. UNIX has a thousand different command line formats and application directory layouts. UNIX gave us autoconf.
I mean UNIX is great and all but we kept hacking shit on
In other words, This means you’re not the audience—this is really aimed at those building the intuitions.
What I meant is that my first reaction is ‘pointless article’, but that reaction is wrong! I think the article is good and necessary. More like it are necessary.
Yes, and no. I mean, I thought the UNIX philosophy was a good idea until I realised how much git demonstrates it. Using flat files, small commands bolted together, fast c parts tied together with bash. It even has the unix thing where each file format or pipe output ends up being a unique mini language inside the program, too. It’s still awful to use.
What? Git is not awful to use, it’s fantastic for all those reasons you just gave. You can dig into the internals of it without having to read any C. You pipe together those files into different formats yourself using a combination of standard utilities and git-x-y-z plumbing commands. What’s awful about that?
I have a much harder time ever getting anything to work in Mercurial, to be honest. Every time I try to use Mercurial it’s just the same as git except some of the commands have slightly more sensible names, everything is incredibly sluggish and lots of features just don’t exist or only exist if you turn on a million extensions.
And then once you have those extensions enabled, it’s just as confusing and inconsistent as git. Go look at the.. is it called queues? Something like that, I’ve forgotten. It’s necessary to get a lot what comes in git by default, and it’s way overcomplicated.
It’s a good way to build an environment but, well, every command takes slightly different arguments, and things like autocomplete don’t come from inspection or understanding the protocol, and we’re still emulating vt100 terminals. There are good ideas but UNIX demonstrates their discovery more than their application.
Of course different commands take different arguments, they do different things and have different purposes. Why would they all be the same? There’s nothing stopping you going and writing a patch for scp that lets it take -R to mean -r, something I always mistype the first time being used to other commands. I doubt they’d reject the patch.
Everything accepts --help and man pages exist.
The state of terminals is a rather different question. It’s just one of those things where it’s a bit of a local maximum. Trying to move to something that isn’t VT100 terminal emulation would require an enormous amount of effort for a relatively small benefit. Emulating VT100 terminals doesn’t really hurt except for a few little things like ctrl-i and tab being the same thing, but in some scenarios that’s what you want, some people want to be able to tab-complete with ctrl-i. But it really has nothing to do with the Unix philosophy anyway.
Autocomplete, well, you could define a format for --usage that is machine-parseable and defines the format for commands. Whenever you do x -o [tab] it calls MACHINE_READABLE_USAGE_OUTPUT=1 x --usage and then parses that result to see that -o is followed by a file, etc. etc. etc. Any other protocol you like. Maybe man pages could have an additional USAGE section with a machine-readable grammar for their usage. Getting shells to all agree on one particular way of doing things is the issue, not the ability to do something like that within the Unix command line model.
The whole idea of commands in a command line is arguably what it means to have ‘the unix command line’, given that they can be piped together and that they input and output text.
On the other hand, plan9 demonstrates them quite well, and some of the problems too. It’s still not exactly pleasant to use, although wonderfully extensible. Plan9 leverages a consistent interface in more ways than UNIX did, exposing every service as a filesystem.
I really don’t think that ‘everything is a file and every service is a filesystem’ is the right way to view the Unix philosophy. Plan9 doesn’t feel like the ultimate culmination of Unix to me. It feels like… I don’t want to be rude about it, I don’t mean this in a rude way, but it feels like a caricature of the Unix philosophy.
The Unix philosophy is implementing things in a standardised and accessible way so that you can use a general suite of tools to handle different things. It doesn’t have to be text, it’s just that it should be text if I can reasonably be text. ffmpeg still feels like a Unix command to me.
The thing that feels least-Unixy to me is audio on my system. Audio should definitely be done differently from how it is. I feel like I have almost no control over it. I want to be able to say ‘take the audio from here and put it into here then merge those audio streams and copy this one to this output then with the new copied output mix the channels to mono’ etc. etc. And not using some arcane GUI.
Awk isn’t one of them, I mean, Awk’s great but it is one of the things, like tcl and bash, and perl that marked the end of ‘do one thing and do it well’, they were glue languages that grew features. Even bash 4 has associative arrays now.
There’s a rule I have that in any system there will always be something complicated. It’s kind of broad, but look at any categorisation, any set of rules, any set of tools, there will always be a ‘misc’. It might be quite hidden or it might be just simply labelled ‘miscellaneous’. In any set of tools there’s always a tool that you use when all the other tools won’t work in all those random little situations that the others don’t fit. In any categorisation of anything, there’ll always be a few objects being categorised that just don’t fit into your neat hierarchy and need to be put into ‘other’.
Unix command line is no different. You have all the little useful tools and then you have awk because sometimes you just have to do something complicated. I mean that’s the reality, right? Sometimes you have to do something complicated.
UNIX has grep and egrep and ripgrep and at least three distinct types of regular expressions in common use. UNIX has a thousand different command line formats and application directory layouts.
‘There should be one — and preferably only one — obvious way to do it’ is the Python motto, not the Unix philosophy.
Unix has grep and egrep and ripgrep, sure. grep is the traditional Unix tool, egrep is an alias for grep -E using extended regular expressions. I assume these are even less actually-regular than grep’s regular regular expressions and thus slower. ripgrep is a modern reimplementation of grep in Rust that (as far as I know) only supports true regular expressions and is very fast as a result.
A better comparison would be between Perl-style and POSIX-style regular expressions, but these are actually really completely different things. You might even get away with arguing that one is really imperative and the other is really declarative. They both have good reasons to exist, there are definitely reasons to prefer either, they coexist and I think that’s a good thing.
There are many different command line formats? Not sure what that really means. Virtually everything today uses - before short commands, allows short commands to be written like -xcvf instead of -x -c -v -f, and supports --long-arguments. Yeah there are a few older commands like ps that support lots of formats in one commands, but that’s just backwards compatibility. The only systems that don’t have a few ugly corners for backwards compatibility are new ones that nobody has used enough yet. The only way to avoid them is to just throw out everything more than a year or two old. Please don’t turn Unix into front end web development.
Application directory layouts? No idea what that means sorry.
UNIX gave us autoconf.
autoconf is to many other build systems as GPL is to BSD licenses. Is it a pain for developers? Yeah, absolutely. But it’s not designed to be easy for developers. It’s designed so that you can give a tarball to a user and they just type ./configure [possibly some arguments]; make; make install. Just as GPL is designed to be friendly for end-users, while BSD is designed to be friendly for developers, autoconf is designed to be essentially invisible to end users. I don’t have to install cmake and deal with CMakeFiles.txt and other annoying crap when I just want to run ./configure && make && sudo make install.
And remember autoconf was not designed for people to compile a simple bit of C software onto one or two Linux distributions as it’s often used today, but to work around the inconsistencies and incompatibilities of dozens of different Unix operating systems. Today it’s unnecessary more than it’s bad. You really need about a 15 line Makefile to make all but the most complex C programmes, which you write once and never touch again. In those 15 lines you can quite easily and readably scan the included headers of each file to generate the dependencies between compilation units and handle all that stuff very easily.
Most of the problems people have with autoconf come from copy-pasting existing configurations and blindly hacking at them with absolutely no understanding of what is actually going on whatsoever. There are configuration switches in some programmes that haven’t been relevant since before the person who wrote them was born.
One particular pattern I’ve noticed time and again in my career is that code duplication often can ease maintenance and simplify remediation of technical debt.
You take a pathway that the data travels, and you duplicate every step/function along that route–then, you can start compacting down and instrumenting things without harming the day-to-day business until you’re ready to cut over; somewhat similar to the detours you might see in street construction.
Duplication also makes it a lot easier to do one-off weird business things that have to get done without risking fiddling with business logic and abstractions the rest of the system depends on. This is part of why schlub just embraces the jank and requires you to copy-and-paste what you need.
I would add that repeating yourself, in some cases, is also essential in performance critical software, not just a matter of avoiding wrong abstraction or avoiding an ugly architecture design. I’m writing a painting application (like Photoshop, Krita, GIMP), and I could use the DRY philosophy on the function to plot the brushes, but I would have a serious performance drop if I did that, because the if/else abstraction would happen in the middle of a rasterization:
void plot(...)
{
while (row < bottom) {
while (col < right) {
if (plot.hardness == 100) {
/* Use simple plot */
} else {
/* Use plot with smoothness */
}
}
}
}
Now, imagine this with multiple parameters (hardness, roughness, density, blending, …). Instead, I copy-paste the function and rewrite with the specific algorithm inside of the nested loop.
You could also assign the drawing function to a function pointer (or lambda or whatever) outside of the loop and just call that within the loop. No branching in the loop, and no duplicated code.
But an indirect call (e.g. calling into a function pointer or a vtable, etc) is still a “branch” - it’s just one with an unknown number of known targets instead of two, which only adds more variables to the equation.
In order for this to be equivalent to inlining the duplicate-for-each-algorithm code, one would have to convince oneself that the indirect branch predictor on the processor is going to reliably guess the CALL and RET targets, that the calling convention doesn’t spill more registers than the inlined execution would (ideally it’s a leaf function so the compiler can elide call prologue/epilogues), and that the processor’s speculative execution system doesn’t have its memory dependency information invalidated by the presence of the call.
Caveat - the above might be less true if you’re programming in a managed runtime - if that function call can be inlined by the JIT compiler at runtime (many high-performance runtimes are very aggressive about function inliing, so it’s not an unrealistic thing to expect), then hopefully the above issues would be lessened.
If you get a chance, there’s a chapter in Beautiful Code that talks about runtime code generation for image processing that, IIRC, singles out stencling and plotting as a running example, that you might find relevant to your interests.
Sometimes I read something like this and think ‘well yeah, obviously nobody is actually saying to take any advice they give to the most extreme possible point, use your judgement’. But then I remember all the code I’ve read (and this seems most common in Ruby for some reason) where people have literally factored out every single function until they’re almost all exactly 1 line long. And the code where they have written functions with four boolean arguments, used in half a dozen places with two combinations of boolean parameters. And the code that’s been hacked and hacked and hacked and hacked together to form a 5000-line shell script when they could have achieved the same result with a few hours and 200 lines of Python or something.
I find this an interesting example if only because I think the Unix command line is a good example of how to do it right, because even if you don’t remember the command to use you can always just emulate most of the other commands with awk. And the general style leads to some really lovely software like gvpr, which I discovered yesterday.
In other words, This means you’re not the audience—this is really aimed at those building the intuitions.
As you explain, the problem is that we don’t often show good judgement. It’s only after knowing the consequences that we tend to take action. From beginners, i’ve often been asked how and when and where to apply things. The problem is, it’s contextual, and I was hoping to try and give that context.
Rather than examining things through re-use, I wanted them to think about coupling. Instead of thinking about modules as collecting like features, as keeping them apart, and the whole ‘rewrites means migrations’ thing too.
Yes, and no. I mean, I thought the UNIX philosophy was a good idea until I realised how much
git
demonstrates it. Using flat files, small commands bolted together, fast c parts tied together with bash. It even has the unix thing where each file format or pipe output ends up being a unique mini language inside the program, too. It’s still awful to use.It’s a good way to build an environment but, well, every command takes slightly different arguments, and things like autocomplete don’t come from inspection or understanding the protocol, and we’re still emulating vt100 terminals. There are good ideas but UNIX demonstrates their discovery more than their application.
On the other hand, plan9 demonstrates them quite well, and some of the problems too. It’s still not exactly pleasant to use, although wonderfully extensible. Plan9 leverages a consistent interface in more ways than UNIX did, exposing every service as a filesystem.
The notion of a uniform interface is also seen in HTTP, and for what it’s worth, how clients on plan9 move from one file to another is very reminiscent of following hypertext in a browser. There are good ideas in UNIX, but there are better examples of them.
Awk isn’t one of them, I mean, Awk’s great but it is one of the things, like tcl and bash, and perl that marked the end of ‘do one thing and do it well’, they were glue languages that grew features. Even bash 4 has associative arrays now.
UNIX has grep and egrep and ripgrep and at least three distinct types of regular expressions in common use. UNIX has a thousand different command line formats and application directory layouts. UNIX gave us autoconf.
I mean UNIX is great and all but we kept hacking shit on
What I meant is that my first reaction is ‘pointless article’, but that reaction is wrong! I think the article is good and necessary. More like it are necessary.
What? Git is not awful to use, it’s fantastic for all those reasons you just gave. You can dig into the internals of it without having to read any C. You pipe together those files into different formats yourself using a combination of standard utilities and
git-x-y-z
plumbing commands. What’s awful about that?I have a much harder time ever getting anything to work in Mercurial, to be honest. Every time I try to use Mercurial it’s just the same as git except some of the commands have slightly more sensible names, everything is incredibly sluggish and lots of features just don’t exist or only exist if you turn on a million extensions.
And then once you have those extensions enabled, it’s just as confusing and inconsistent as git. Go look at the.. is it called queues? Something like that, I’ve forgotten. It’s necessary to get a lot what comes in git by default, and it’s way overcomplicated.
Of course different commands take different arguments, they do different things and have different purposes. Why would they all be the same? There’s nothing stopping you going and writing a patch for scp that lets it take
-R
to mean-r
, something I always mistype the first time being used to other commands. I doubt they’d reject the patch.Everything accepts
--help
andman
pages exist.The state of terminals is a rather different question. It’s just one of those things where it’s a bit of a local maximum. Trying to move to something that isn’t VT100 terminal emulation would require an enormous amount of effort for a relatively small benefit. Emulating VT100 terminals doesn’t really hurt except for a few little things like ctrl-i and tab being the same thing, but in some scenarios that’s what you want, some people want to be able to tab-complete with ctrl-i. But it really has nothing to do with the Unix philosophy anyway.
Autocomplete, well, you could define a format for
--usage
that is machine-parseable and defines the format for commands. Whenever you dox -o [tab]
it callsMACHINE_READABLE_USAGE_OUTPUT=1 x --usage
and then parses that result to see that-o
is followed by a file, etc. etc. etc. Any other protocol you like. Maybeman
pages could have an additionalUSAGE
section with a machine-readable grammar for their usage. Getting shells to all agree on one particular way of doing things is the issue, not the ability to do something like that within the Unix command line model.The whole idea of commands in a command line is arguably what it means to have ‘the unix command line’, given that they can be piped together and that they input and output text.
I really don’t think that ‘everything is a file and every service is a filesystem’ is the right way to view the Unix philosophy. Plan9 doesn’t feel like the ultimate culmination of Unix to me. It feels like… I don’t want to be rude about it, I don’t mean this in a rude way, but it feels like a caricature of the Unix philosophy.
The Unix philosophy is implementing things in a standardised and accessible way so that you can use a general suite of tools to handle different things. It doesn’t have to be text, it’s just that it should be text if I can reasonably be text.
ffmpeg
still feels like a Unix command to me.The thing that feels least-Unixy to me is audio on my system. Audio should definitely be done differently from how it is. I feel like I have almost no control over it. I want to be able to say ‘take the audio from here and put it into here then merge those audio streams and copy this one to this output then with the new copied output mix the channels to mono’ etc. etc. And not using some arcane GUI.
There’s a rule I have that in any system there will always be something complicated. It’s kind of broad, but look at any categorisation, any set of rules, any set of tools, there will always be a ‘misc’. It might be quite hidden or it might be just simply labelled ‘miscellaneous’. In any set of tools there’s always a tool that you use when all the other tools won’t work in all those random little situations that the others don’t fit. In any categorisation of anything, there’ll always be a few objects being categorised that just don’t fit into your neat hierarchy and need to be put into ‘other’.
Unix command line is no different. You have all the little useful tools and then you have awk because sometimes you just have to do something complicated. I mean that’s the reality, right? Sometimes you have to do something complicated.
‘There should be one — and preferably only one — obvious way to do it’ is the Python motto, not the Unix philosophy.
Unix has grep and egrep and ripgrep, sure. grep is the traditional Unix tool, egrep is an alias for
grep -E
using extended regular expressions. I assume these are even less actually-regular than grep’s regular regular expressions and thus slower. ripgrep is a modern reimplementation of grep in Rust that (as far as I know) only supports true regular expressions and is very fast as a result.A better comparison would be between Perl-style and POSIX-style regular expressions, but these are actually really completely different things. You might even get away with arguing that one is really imperative and the other is really declarative. They both have good reasons to exist, there are definitely reasons to prefer either, they coexist and I think that’s a good thing.
There are many different command line formats? Not sure what that really means. Virtually everything today uses
-
before short commands, allows short commands to be written like-xcvf
instead of-x -c -v -f
, and supports--long-arguments
. Yeah there are a few older commands likeps
that support lots of formats in one commands, but that’s just backwards compatibility. The only systems that don’t have a few ugly corners for backwards compatibility are new ones that nobody has used enough yet. The only way to avoid them is to just throw out everything more than a year or two old. Please don’t turn Unix into front end web development.Application directory layouts? No idea what that means sorry.
autoconf is to many other build systems as GPL is to BSD licenses. Is it a pain for developers? Yeah, absolutely. But it’s not designed to be easy for developers. It’s designed so that you can give a tarball to a user and they just type
./configure [possibly some arguments]; make; make install
. Just as GPL is designed to be friendly for end-users, while BSD is designed to be friendly for developers, autoconf is designed to be essentially invisible to end users. I don’t have to installcmake
and deal withCMakeFiles.txt
and other annoying crap when I just want to run./configure && make && sudo make install
.And remember autoconf was not designed for people to compile a simple bit of C software onto one or two Linux distributions as it’s often used today, but to work around the inconsistencies and incompatibilities of dozens of different Unix operating systems. Today it’s unnecessary more than it’s bad. You really need about a 15 line Makefile to make all but the most complex C programmes, which you write once and never touch again. In those 15 lines you can quite easily and readably scan the included headers of each file to generate the dependencies between compilation units and handle all that stuff very easily.
Most of the problems people have with autoconf come from copy-pasting existing configurations and blindly hacking at them with absolutely no understanding of what is actually going on whatsoever. There are configuration switches in some programmes that haven’t been relevant since before the person who wrote them was born.
There is so so much to unpack here but frankly it seems like a pointless conversation
The command line tool that has markov chained manual pages as satire. The command line tool where the primary interface is stack overflow.
I’m really not sure we’ve used the same tool.
Tell that to the UNIX authors, who wrote it.
Other way around, buddy. GNU grep built stuff in. Unix, the system with a command line program called
[
Also I think you’re also confusing the gnu userland with unix which went against most of the unix design ideas at the time.
[Comment from banned user removed]
I feel enjoying git’s interface is already sufficient punishment for your opinions
One particular pattern I’ve noticed time and again in my career is that code duplication often can ease maintenance and simplify remediation of technical debt.
You take a pathway that the data travels, and you duplicate every step/function along that route–then, you can start compacting down and instrumenting things without harming the day-to-day business until you’re ready to cut over; somewhat similar to the detours you might see in street construction.
Duplication also makes it a lot easier to do one-off weird business things that have to get done without risking fiddling with business logic and abstractions the rest of the system depends on. This is part of why schlub just embraces the jank and requires you to copy-and-paste what you need.
I would add that repeating yourself, in some cases, is also essential in performance critical software, not just a matter of avoiding wrong abstraction or avoiding an ugly architecture design. I’m writing a painting application (like Photoshop, Krita, GIMP), and I could use the DRY philosophy on the function to plot the brushes, but I would have a serious performance drop if I did that, because the if/else abstraction would happen in the middle of a rasterization:
Now, imagine this with multiple parameters (hardness, roughness, density, blending, …). Instead, I copy-paste the function and rewrite with the specific algorithm inside of the nested loop.
You could also assign the drawing function to a function pointer (or lambda or whatever) outside of the loop and just call that within the loop. No branching in the loop, and no duplicated code.
But an indirect call (e.g. calling into a function pointer or a vtable, etc) is still a “branch” - it’s just one with an unknown number of known targets instead of two, which only adds more variables to the equation.
In order for this to be equivalent to inlining the duplicate-for-each-algorithm code, one would have to convince oneself that the indirect branch predictor on the processor is going to reliably guess the CALL and RET targets, that the calling convention doesn’t spill more registers than the inlined execution would (ideally it’s a leaf function so the compiler can elide call prologue/epilogues), and that the processor’s speculative execution system doesn’t have its memory dependency information invalidated by the presence of the call.
Caveat - the above might be less true if you’re programming in a managed runtime - if that function call can be inlined by the JIT compiler at runtime (many high-performance runtimes are very aggressive about function inliing, so it’s not an unrealistic thing to expect), then hopefully the above issues would be lessened.
If you get a chance, there’s a chapter in Beautiful Code that talks about runtime code generation for image processing that, IIRC, singles out stencling and plotting as a running example, that you might find relevant to your interests.
Thanks, I will make sure to check it out!