Unix was never as simple as we’d like to remember – or pretend – that it was.
Plenty of gotchas have always been lurking around the corners.
For instance, newlines are totally legit in filenames. So
in the absence of insane names, ls |foo will write filenames with one
name per line to foo’s stdin.
Usually it’s fine to treat ls output as a series of newline-separated
filenames, because by convention, nobody creates filenames
with newlines in them.
But for robustness and security we have things like the -0 argument to xargs and cpio,
and the -print0 argument to find.
For a system that is based on passing around data as text, the textual
input and output formats of programs are often
ill-suited to machine parsing. Examples of unspecified or underspecified
text formats are not difficult to find.
I’m really glad to see some venerable tools sprouting --json flags in recent
years, and I hope the trend continues.
You might consider including EDN, I think it makes some interesting choices.
Another point: the statement that JSON doesn’t support integers falls into a weird gray area. Technically it’s not specified what it supports (https://tools.ietf.org/html/rfc8259#section-6). If you’re assuming the data gets mangled by a JS system, you’re limited to integers representable by doubles, but that’s a danger point for any data format.
Looks fine, but it’s binary and schema-defined, so it makes very different design tradeoffs than JSON does. It’s not an alternative to JSON, it’s an alternative to protobuf, cap’n proto or flatbuffers, or maybe CBOR or msgpack. There’s a plethora of basically-okay binary transfer formats these days, probably because they prevent people from arguing as much about syntax.
I won’t go into details about where, but at work we have used stateless tokens for the longest time. For us, it’s been a terrible design decision and we’re finally moving off it. Why? Decryption is CPU bound, so it doesn’t scale nearly as well as memory lookups, which is what stateful tokens represent. Moreover a lot of our decryption libraries do not seem to be particularly consistent (high variance if we assume that the distribution is somewhat normal) in their timing. This poses a problem for optimizing the tail end of our latency. At small to medium scales stateless tokens are fine, but as we took on higher scale it just didn’t work. Memory lookups are fast, consistent, and scale well.
I remember reading a book on UNIX back in the day (1994? around then) which talked about this issue. The given solution in this professional tome was to cd up and then delete the whole directory.
(Asking how to handle this problem was also a common question in interviews back in the day, maybe still today I don’t know.)
No wonder they google. Have you tried reading a man page without knowing Linux inside and out? They all pretty much suck. Take the tar man-page for example. It says it’s a “short description” of tar, while being over 1000 lines long, but it fails to include ANY examples of how to actually use the tool. There’s examples on how to use different option styles (traditional and short options), a loooong list of flags and what they do in excruciating detail, a list of “usages” that don’t explain what they do and what return values tar can give.
I mean, imagine you need to unpack a tar.gz file, but you have never used tar before and you are somewhat new to Linux in general, but you have learned about the man command and heard you need to use tar to unzip a file (not a given really) so you dutifully write man tar in your terminal and start reading. The first line you are met with looks like this:
tar {A|c|d|r|t|u|x}[GnSkUWOmpsMBiajJzZhPlRvwo] [ARG…]
Great. This command has more flags than the UN headquarters. You look at it for a couple seconds and realise you have no idea what any of the switches mean, so you scroll a bit down:
tar -c [-f ARCHIVE] [OPTIONS] [FILE…]
Cool. This does something with an archive and a file (Wouldn’t it be helpful if it had a short description of what it does right there?). What it does is a mystery as it doesn’t say. You still have to scroll down to figure out what -c means. After scrolling for 100 lines you get to the part that lists out all the options and find -c. It means that it creates an archive. Cool. Not what we want, but now that we are here maybe we can find an option that tells us how to unpack an archive?
-x, –extract, –get
Sweet! We just found the most common usage at line 171! Now we scroll up to the top and find this usage example:
tar -x [-f ARCHIVE] [OPTIONS] [MEMBER…]
The fuck is a MEMBER? It’s in brackets, so maybe that means it’s optional? Let’s try it and see what happens. You write tar -x -f sample.tar.gz in your terminal, and hey presto! It works! Didn’t take us more than 10 minutes reading the man page and trying to understand what it means.
Or, if you understand how to use modern tools like Google to figure out how to do things, you write the query “unzip tar.gz file linux” into Google and the information box at the top says this:
For tar.gz. To unpack a tar.gz file, you can use the tar command from the shell. Here’s an example: tar -xzf rebol.tar.gz.
You try it out, and what do you know? It works! Took us about 10 seconds.
It’s no wonder that people search for solutions instead. The man files were obviously not written for user consumption (maybe for experienced sysadmins or Linux developers). In addition, this entire example assumes you know that tar can be used to extract files to begin with. If you don’t know that, then you are shit out of luck even before you open the man file. Google is your only option, and considering the experience of reading man files, no surprise people keep using Google instead of trying to read the “short description” that is the size of the fucking Silmarillion!
I don’t disagree with the general sentiment here, but I think you’ve found a man page that is unusually bad. Here’s some excerpts from some random ubuntu box.
it fails to include ANY examples of how to actually use the tool.
EXAMPLES
Create archive.tar from files foo and bar.
tar -cf archive.tar foo bar
List all files in archive.tar verbosely.
tar -tvf archive.tar
Extract all files from archive.tar.
tar -xf archive.tar
Cool. This does something with an archive and a file (Wouldn’t it be helpful if it had a short description of what it does right there?).
Mine has, comfortably within the first screenful:
-c, --create
create a new archive
Not what we want, but now that we are here maybe we can find an option that tells us how to unpack an archive?
Something like 20 lines below that:
-x, --extract, --get
extract files from an archive
Anyway, I don’t think man pages are intended to be good tutorials in the general case; they’re reference materials for people who already have an idea of what they’re doing. Presumably beginners were expected to learn the broad strokes through tutorials, lectures, introductory texts etc.
I think that split is about right for people who are or aspire to be professional sysadmins, and likely anyone else who types shell commands on a daily basis—learning one’s tools in depth pays dividends, in my experience—but if it’s the wrong approach for other groups of people, well, different learning resources can coexist. There’s no need to bash one for not being the other.
I was just going to bring up info. I believe in many cases manpages for GNU tools are actually written by downstream distributors. For example Debian Policy says every binary should have a manpage, so packagers have to write them to comply with policy. Still more GNU manpages have notes somewhere in them that say “this manpage might be out of date cause we barely maintain it; check the info documentation.” Really irritating. Honestly I never learned how to use info because man is Good Enough™. I mean, come on. Why must GNU reinvent everything?
I don’t think the author has to deny this, the difficulty of teaching doesn’t have to be the same as using. The difficulty in using, complicates the system, that then make it harder to teach – for example because of --json flags.
Yes, several years with AIX, little less with HP-UX, a lot less (unfortunatelly) with Solaris but I also used OpenSolaris on the laptop in the past. Some AIX admins literally love smitty, some hate it. I am somewhere in the middle, it has its uses but personally I miss the FreeBSD approach with all configuration kept in plain and simple text configuration files. I always used F6 at smitty to check which command it will execute to put it onto some simple script instead of making all these choices at smitty level.
We should know a little more information about utilities or libraries than before. For instance, JS only was, but now we have to know JS and one of lib/framework React, Vue or Angular.
Sometimes when I am writing, I choose to do so within the MS-DOS Edit application on a 30 year old luggable. There is something innately wonderful in its simplicity compared to modern day solutions, I also love the sound of a floppy disk drive - sneaker-net is best network.
Yeah, when I was working in High Energy Physics the graduate students formed a very high latency high bandwidth network, shuttling tapes to and from neutrino detectors out in the desert.
Unix was never as simple as we’d like to remember – or pretend – that it was. Plenty of gotchas have always been lurking around the corners.
For instance, newlines are totally legit in filenames. So in the absence of insane names,
ls |foo
will write filenames with one name per line to foo’s stdin. Usually it’s fine to treat ls output as a series of newline-separated filenames, because by convention, nobody creates filenames with newlines in them. But for robustness and security we have things like the-0
argument to xargs and cpio, and the-print0
argument to find.For a system that is based on passing around data as text, the textual input and output formats of programs are often ill-suited to machine parsing. Examples of unspecified or underspecified text formats are not difficult to find. I’m really glad to see some venerable tools sprouting
--json
flags in recent years, and I hope the trend continues.Anything but JSON. If plain text is being used because of its readability, JSON is largely antithetical to that purpose.
JSON fits a nice sweet spot where both humans and machines can both read and edit it with only moderate amounts of anguish. As far as I can tell there is not a good general-purpose replacement for JSON.
a long article promoting JSON with less than a full sentence for S expression
What? It’s marked to-do. Here, I’ll just do it. Check the page again.
What about Dhall?
You might consider including EDN, I think it makes some interesting choices.
Another point: the statement that JSON doesn’t support integers falls into a weird gray area. Technically it’s not specified what it supports (https://tools.ietf.org/html/rfc8259#section-6). If you’re assuming the data gets mangled by a JS system, you’re limited to integers representable by doubles, but that’s a danger point for any data format.
I actually like this quite a bit, thanks!
How about the proposed BARE message encoding?
Looks fine, but it’s binary and schema-defined, so it makes very different design tradeoffs than JSON does. It’s not an alternative to JSON, it’s an alternative to protobuf, cap’n proto or flatbuffers, or maybe CBOR or msgpack. There’s a plethora of basically-okay binary transfer formats these days, probably because they prevent people from arguing as much about syntax.
I won’t go into details about where, but at work we have used stateless tokens for the longest time. For us, it’s been a terrible design decision and we’re finally moving off it. Why? Decryption is CPU bound, so it doesn’t scale nearly as well as memory lookups, which is what stateful tokens represent. Moreover a lot of our decryption libraries do not seem to be particularly consistent (high variance if we assume that the distribution is somewhat normal) in their timing. This poses a problem for optimizing the tail end of our latency. At small to medium scales stateless tokens are fine, but as we took on higher scale it just didn’t work. Memory lookups are fast, consistent, and scale well.
You should post this as an article! A few comments:
Careful what you wish for…
FreeBSD has had
libXo
for a while: https://wiki.freebsd.org/LibXoYou can also legitimately give a file a name that starts with a dash, making it challenging to access or delete unless you know the trick.
I remember reading a book on UNIX back in the day (1994? around then) which talked about this issue. The given solution in this professional tome was to cd up and then delete the whole directory.
(Asking how to handle this problem was also a common question in interviews back in the day, maybe still today I don’t know.)
That’s… Wrong, at best.
rm ./-rf
always worked, even when the tool is buggy and doesn’t support--
argument parsing termination.The man page for (GNU coreutils)
rm
now mentions both methods prominently. I believe you’ll get a prompt if you try it interactively in bash too.Yeah but kids these days don’t read man, they google, or at best, serverfault.
</oldmanyellsatcloud>
No wonder they google. Have you tried reading a man page without knowing Linux inside and out? They all pretty much suck. Take the tar man-page for example. It says it’s a “short description” of tar, while being over 1000 lines long, but it fails to include ANY examples of how to actually use the tool. There’s examples on how to use different option styles (traditional and short options), a loooong list of flags and what they do in excruciating detail, a list of “usages” that don’t explain what they do and what return values tar can give.
I mean, imagine you need to unpack a tar.gz file, but you have never used tar before and you are somewhat new to Linux in general, but you have learned about the man command and heard you need to use
tar
to unzip a file (not a given really) so you dutifully writeman tar
in your terminal and start reading. The first line you are met with looks like this:Great. This command has more flags than the UN headquarters. You look at it for a couple seconds and realise you have no idea what any of the switches mean, so you scroll a bit down:
Cool. This does something with an archive and a file (Wouldn’t it be helpful if it had a short description of what it does right there?). What it does is a mystery as it doesn’t say. You still have to scroll down to figure out what -c means. After scrolling for 100 lines you get to the part that lists out all the options and find -c. It means that it creates an archive. Cool. Not what we want, but now that we are here maybe we can find an option that tells us how to unpack an archive?
Sweet! We just found the most common usage at line 171! Now we scroll up to the top and find this usage example:
The fuck is a
MEMBER
? It’s in brackets, so maybe that means it’s optional? Let’s try it and see what happens. You writetar -x -f sample.tar.gz
in your terminal, and hey presto! It works! Didn’t take us more than 10 minutes reading the man page and trying to understand what it means.Or, if you understand how to use modern tools like Google to figure out how to do things, you write the query “unzip tar.gz file linux” into Google and the information box at the top says this:
You try it out, and what do you know? It works! Took us about 10 seconds.
It’s no wonder that people search for solutions instead. The man files were obviously not written for user consumption (maybe for experienced sysadmins or Linux developers). In addition, this entire example assumes you know that tar can be used to extract files to begin with. If you don’t know that, then you are shit out of luck even before you open the man file. Google is your only option, and considering the experience of reading man files, no surprise people keep using Google instead of trying to read the “short description” that is the size of the fucking Silmarillion!
/rant
I don’t disagree with the general sentiment here, but I think you’ve found a man page that is unusually bad. Here’s some excerpts from some random ubuntu box.
Mine has, comfortably within the first screenful:
Something like 20 lines below that:
Anyway, I don’t think man pages are intended to be good tutorials in the general case; they’re reference materials for people who already have an idea of what they’re doing. Presumably beginners were expected to learn the broad strokes through tutorials, lectures, introductory texts etc.
I think that split is about right for people who are or aspire to be professional sysadmins, and likely anyone else who types shell commands on a daily basis—learning one’s tools in depth pays dividends, in my experience—but if it’s the wrong approach for other groups of people, well, different learning resources can coexist. There’s no need to bash one for not being the other.
This is a GNU-ism, you’re supposed to read the Info book: https://www.gnu.org/software/tar/manual/tar.html
But that also lacks a section detailing the most common invocations.
OpenBSD does it better: https://man.openbsd.org/tar
Of course, on the 2 Debian-based systems I have access to,
info
pages aren’t even installed… you just get the man page when you invokeinfo tar
.I was just going to bring up info. I believe in many cases manpages for GNU tools are actually written by downstream distributors. For example Debian Policy says every binary should have a manpage, so packagers have to write them to comply with policy. Still more GNU manpages have notes somewhere in them that say “this manpage might be out of date cause we barely maintain it; check the info documentation.” Really irritating. Honestly I never learned how to use info because man is Good Enough™. I mean, come on. Why must GNU reinvent everything?
I don’t think the author has to deny this, the difficulty of teaching doesn’t have to be the same as using. The difficulty in using, complicates the system, that then make it harder to teach – for example because of
--json
flags.Teaching UNIX is still simple (BSD/Solaris/HP-UX/AIX/…).
Teaching Linux is growing harder and harder.
Have you ever even used a commercial unix? I wouldn’t wish smit on my worst enemy.
Yes, several years with AIX, little less with HP-UX, a lot less (unfortunatelly) with Solaris but I also used OpenSolaris on the laptop in the past. Some AIX admins literally love smitty, some hate it. I am somewhere in the middle, it has its uses but personally I miss the FreeBSD approach with all configuration kept in plain and simple text configuration files. I always used F6 at smitty to check which command it will execute to put it onto some simple script instead of making all these choices at smitty level.
Small but important detail, I would say.
We should know a little more information about utilities or libraries than before. For instance, JS only was, but now we have to know JS and one of lib/framework React, Vue or Angular.
Sometimes when I am writing, I choose to do so within the MS-DOS Edit application on a 30 year old luggable. There is something innately wonderful in its simplicity compared to modern day solutions, I also love the sound of a floppy disk drive - sneaker-net is best network.
Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.
IP over avian carriers can also give you crazy bandwidth give the capacity of modern SD cards and how much a carrier pidgeon can actually carry.
Yeah, when I was working in High Energy Physics the graduate students formed a very high latency high bandwidth network, shuttling tapes to and from neutrino detectors out in the desert.
A project at my job just recently brought back an entire NAS in a crate from a two week long project demo several states away.