1. 53
  1.  

  2. 17

    Though it’d be useful in some ways, I’m not convinced the juice from “JSON all the things” would be worth the squeeze. If you’re going to touch every tool on a system for a reason like this, it might make more sense to go all the way, and sling objects around powershell-style. The resulting system would be more powerful and have less overhead, and you could easily pipe those objects through a “jsonify” utility to get json out of any tool where that’d be beneficial.

    1. 3

      I enthusiastically agree!

      I’ve been exploring Powershell lately, and I think its object pipelines model is so incredibly powerful, I’m gonna gush about it for a minute.

      When you express everything as objects with a list of named parameter bearing methods, and include mechanisms for making documentation trivial to add, you wind up with this amazingly rich system that’s 100% explorable by users interactively from the command line.

      I can’t express as a 30+ year UNIX fan how liberating this is. Rather than having to rely on man pages which may or may not be present, I can query the command itself for what its parameters are. And I can combine objects in all sorts of interesting ways that would be very difficult if not impossible through strict adherence to the “everything is a stream of bytes” mantra.

      Other systems like AppleScript and ARexx have done this before to greater or lesser extents, and IMO Jeffrey Snover (an astonishingly smart dude who was involved in the POSIX shell spec in some way) has learned from all of them.

      We should totally steal back his great ideas and help move UNIX forward!

      1. 1

        It won’t really be UNIX any more, so it’s move on from UNIX.

        1. 3

          So UNIX is forever fixed to a set of rules around how its userland shell and applications will interact and innovation is verboten?

          That feels like a mistake to me. UNIX is what we say it is, and it must either evolve or ultimately, over the VERY long haul, die.

          I’m not suggesting that it’s Powershell style objects or bust, that’s just one model I personally find very attractive, but to my mind the question the author is asking the UNIX community is a valid one: Is there a richer model we can use to allow applications to interact with each other and promote a richer palette of possibilities for users and application developers?

          I see that as a question worth asking, and I think changing the way UNIX shells and apps interact should be on the table for this ongoing dialog.

          1. 2

            Is there anything wrong with UNIX dying? Isn’t it already, with the madness going on in Linux land? The changes you want or propose are radical—like making a sports car out of a Humvee. They have far-reaching consequences. And trying to make a bad superset seems deeply unappealing.

            1. 2

              No, there isn’t, but if there’s one thing I’ve learned from hard experience over 30 years in this business it’s that being UTTERLY closed and inflexible to change is rarely the correct strategy.

              You don’t need to want any particular change, or be open to every change, but becoming hidebound about ANYTHING in this industry is bound to cause problems for you if not for the technology in quesetion.

      2. 3

        What should go in an object that’s missing from json? Methods? But then you’re talking bi-directional not pipes

        1. 4

          This is a pretty good intro, I think. Even if you want to avoid methods, though, think that json could serialize everything you care about reasonably, and don’t consider the extra serialization needlessly wasteful, passing object handles from one tool to another gives you a certain liveness missing from a serialized snapshot. For example, the ifconfig gadget in the OP could pass a set of handles which tools later in the pipeline could query for properties. So if the system’s dhcp client updates the IPv4 address for an interface between the time the list is created and the time the property is inspected, the second tool would see the up-to-date address.

          1. 3

            Right, so you want live objects, not piped data. Dbus is pretty good at this.

      3. 15

        The solution is to start an effort to go back to all of these legacy GNU and non-GNU command line utilities that output text data and add a JSON output option to them.

        Look into FreeBSD’s libxo.

        1. 5

          Might be compatible with Relational pipes:

          $ cat libxo.json 
          items: [
          { "blocks": 36, "path" : "./src" },
          { "blocks": 40, "path" : "./bin" },
          { "blocks": 90, "path" : "./" }
          ]
          
          $ cat libxo.json | relpipe-in-json | relpipe-out-tabular 
          items:
          ╭─────────────────┬───────────────╮
          │ blocks (string) │ path (string) │
          ├─────────────────┼───────────────┤
          │ 36              │ ./src         │
          │ 40              │ ./bin         │
          │ 90              │ ./            │
          ╰─────────────────┴───────────────╯
          Record count: 3
          
          $ cat libxo.json | relpipe-in-json | relpipe-tr-infertypes | relpipe-out-tabular 
          items:
          ╭──────────────────┬───────────────╮
          │ blocks (integer) │ path (string) │
          ├──────────────────┼───────────────┤
          │               36 │ ./src         │
          │               40 │ ./bin         │
          │               90 │ ./            │
          ╰──────────────────┴───────────────╯
          Record count: 3
          

          In Relpipes, you can do much more – various queries and transformations using well-known languages (regular expressions, SQL, AWK, Scheme, XPath… see examples) and send the output to various formats (XML, CSV, YAML, ASN.1, INI, ODS, GUI, Recfile…) or systems (X11, JACK/MIDI…). And everything in the modular way, so you only deal with the complexity you really need/use (the optional complexity principle).

        2. 8

          It’s interesting to see people pushing for the concepts that Unix explicitly threw out early on. To quote ken:

          Many familiar computing ‘concepts’ are missing from UNIX. Files have no records. There are no access methods. User programs contain no system buffers. There are no file types. These concepts fill a much-needed gap. I sincerely hope that when future systems are designed by manufacturers the value of some of these ingrained notions is reexamined. Like the politician and his ‘common man’, manufacturers have their ‘average user’.

          1. 2

            Ooh yes this! This is why I find it so off-putting when folks say “You want to change $X. That makes it no longer UNIX”

            Balderdash! :) UNIX is a living organism, or should be. It can evolve. Because we all care about it we should shepherd its evolution with loving care but, over time, change it must or it will be overtaken by whatever the Next Big Thing is :)

            (And yes I know it’s thrived for half a century already but even great dynasties must eventually evolve or fall.)

          2. 8

            How would you compare this to Nushell’s approach to making it easy to pipe together commands with structured data? https://www.nushell.sh/ Is one more likely to work than the other?

            1. 2

              Ah I really need to dig into this more. I think what I’m seeing from the quickstart docs is exactly what I’d want.

              As much as I’d like a full on objects/methods model the thing that would accelerate my day to day work most is a standard mechanism for commands to exchange highly structured data with named random access fields, and it seems like Nu accomplishes that somehow.

              I’m going to be super curious to see what mechanism it uses under the hood.

            2. 5

              The solution is to start an effort to go back to all of these legacy GNU and non-GNU command line utilities that output text data and add a JSON output option to them.

              I had this same idea in ~2015 after reading cat -v or something and hearing about PowerShell objects. I still think it’s a good idea, but I never got very far with implementing it, since it’s a big project. This seems like a good approach to boiling the ocean one bucket at a time. I was trying to reimplement things from scratch, which is a bit more difficult than adding a new layer.

              1. 5

                If you want this, why not simply space separated columns with consistent escaping of spaces? This would preserve the easy human readability and direct manipulation benefits of the shell, without losing anything over json. Anything that can be represented with json could be represented as a keypath/value column.

                And, unlike json, it’s streamable – you can start processing records before the last one is emitted. Json does not allow this: It requires you to buffer until the end to detect a json syntax error.

                1. 4

                  You can stream JSON. You just emit one object per line instead of using an array: http://ndjson.org/

                  1. 5

                    That’s not valid json. It’s a non-standard extension – and I think it’s inferior to space-separated columns with quoting to handle spaces.

                    1. 1

                      Inferior in what context? Streaming JSON readers are a dime a dozen in every programming language. The thing you’re proposing is probably fine too, but we don’t need two different ways to do it, and the existing way is fine.

                      1. 2

                        Inferior in the context of a shell:

                        • Streaming TSV libraries are a dime a dozen in every programming language
                        • It allows tools to output in one format for both humans and the machine, instead of doing it two different ways and forcing me to either grub through json output by hand, or guess at how the output translates.
                        • It’s simpler to parse with existing tools like awk, sed, etc, and if it becomes ubiquitous, it would be a smaller lift to convince maintainers that a simple columnar format should live directly in these tools
                        • It’s simpler to parse without existing tools when writing new code
                        • It’s simpler to parse without tools at all: just use your eyeballs
                        • It’s simpler to generate in a streaming format from existing tools
                        • It’s simpler to generate at all, especially if you allow yourself to use glibc’s register_printf_function to add a ‘%q’ format specifier that quotes.

                        And finally, the only real argument for JSON isn’t really valid:

                        • The ‘existing way’ doesn’t actually exist: there are approximately no well established shell tools that produce json output natively. It’s a similar (or larger) lift to convert the shell ecosystem to use json, the transition period will be harder and longer, and the end result will not be as pleasant to use.
                        1. 2

                          Streaming TSV libraries are a dime a dozen in every programming language

                          You can’t count on two libraries handling newlines the same way, so it’s not practically composable. That’s the whole point of using JSON. It doesn’t have to be good. It just has to be a standard.

                          It allows tools to output in one format for both humans and the machine, instead of doing it two different ways and forcing me to either grub through json output by hand, or guess at how the output translates.

                          JSON is reasonably readable, and there are plenty of tools like gron to clean it up more. I don’t see how TSV is especially more readable, particularly once you take out newlines.

                          It’s simpler to parse with existing tools like awk, sed, etc, and if it becomes ubiquitous, it would be a smaller lift to convince maintainers that a simple columnar format should live directly in these tools

                          Awk and sed have had forty years to catch on, but they’ve been stalled for the last twenty or so. Good luck, I guess, but I give this project a 1% chance of really catching on, and an awk renaissance a 0.001% chance of happening.

                    2. 3

                      Doesn’t even have to be one per line. Just push into the parser until you get a whole object, then start over with the next byte. This is how jq works

                      1. 2

                        Interesting – so jq will return valid results from malformed json? That’s rather unexpected for a strict format.

                        1. 2

                          Not malformed JSON, but it is a stream processor so it can process a stream of valid json items yes

                          1. 3

                            I’m not sure what you mean by ‘not malformed json’: will it produce output for something like:

                            {"array of objects": [
                            	{
                            		"index": 0,
                            		"index start at 5": 5
                            	},
                            	{
                            		"index": 1,
                            		"index start at 5": 6
                            	},
                            	{
                            		"index": 2,
                                          [[[[[[[[[[[[
                                         AND IT GOES OFF THE RAILS!
                                        l1lk12j304580q1298wdafl;kjasc vlkawd f[0asfads
                            

                            If it streams, that means it will produce output from the start of the botch. If it catches errors and only prints within valid json objects, it needs unbounded buffering.

                            For loosely defined formats where a truncation of the format is still valid, I’d expect the former, but not for a self contained, validatable, delimited format like json.

                            1. 1

                              There is no valid object in your example, so it errors trying to pull the first item out of a stream. Here’s an example I think you were trying to go for:

                              {
                              	"a": 1,
                              	"b": [1,2]
                              }{barf,"
                              

                              This will produce output for the first object and a syntax error for the second

                              1. 1

                                There are multiple valid sub-objects. If no output is produced, then the elements can’t be processed incrementally.

                                You can chose to restrict yourself if you want, but this seems unnecessary.

                    3. 3

                      Using tabular data makes it more annoying to work with anything that uses multiple properties of an object. For example, here’s a pipeline to get all of the interfaces with a valid global IPv6 address:

                      > ip -j addr | jq --raw-output '.[] | select(.addr_info | any(.family == "inet6" and .scope == "global")) | .ifname'
                      lxdbr0
                      

                      How would you do that if your input data was like

                      0.ifname lxdbr0
                      0.addr_info.0 inet global
                      0.addr_info.1 inet6 global
                      ...
                      

                      ?

                      e: Also, afaik none of the standard Unix string-manipulation tools deal well with escaped spaces (unless you replaced them with something else entirely, which would lose you a lot of the readability).

                      1. 2

                        if the format wasn’t a naive translation from json, but used the tabular format better:

                        0 ifname lxdbr0
                        0 addr_info inet global
                        0 addr_info inet6 global
                        

                        Then here’s a translation – presumably, in a world where this format took over, awk would be augmented to know about the column escaping:

                            % awk '
                            	$2=="ifname"{name[$1]=$3}
                            	$2=="addr_info"&& $3=="inet6" { print name[$1] }
                            '
                        

                        With this same format, where the ‘ifname’ applies to all subsequent attributes, and indentation is taken as an empty inital column (" foo bar" == "'' foo bar"), you could clean that output up further:

                        ifname lxdbr0
                            addr_info inet global
                            addr_info inet6 global
                        

                        and then to parse it, you could do this:

                            % awk '
                            	$1=="ifname"{name=$1}
                            	$2=="addr_info"&& $3=="inet6" { print name }
                            '
                        

                        which is just as rigorous as the json, but infinitely more readable: I don’t need tools if I want to interact with it directly, which means that tools no longer need output modes, and I no longer need to mentally translate between output modes when interacting with the tools.

                        The original can be done similarly, you’d just need to split json-influenced path on ‘.’ to get the useful info:

                            % awk '
                            	$2=="ifname"{
                            	     split($1, path, ".");
                            	     name[path[0]]=$2
                            	}
                            	$2=="addr_info"&& $3=="inet6" {
                            	    split($1, path, ".");
                            	    print name[path[0]]
                            	}
                            '
                        

                        As far as standard unix utilities: I thought the discussion was between converting to json and converting to a simpler, friendier format. That said, it does work better with existing tools, which would make the transition easier.

                        1. 1

                          But now your code has a bug: if it has multiple ipv6 addresses, it’ll print out the interface name more than once. (I don’t know if this is sensible in this specific case, but if you adjust it to “an ipv4 or ipv6 global address” it clearly could happen).

                          And I think in general this sort of approach is going to be more sensitive to the order fields are printed out in than a JSON object, where the order of the fields is (generally) irrelevant.

                          1. 1

                            And I think in general this sort of approach is going to be more sensitive to the order fields are printed out in than a JSON object, where the order of the fields is (generally) irrelevant.

                            Yes, taking advantage of the order is definitely a useful feature of this format.

                            1. 2

                              I don’t consider relying on the order, which seems more like an implementation detail to me, to be a feature.

                              1. 2

                                It’s an implementation choice, not an implementation detail. And it allows a great deal more expressiveness if you chose to use it effectively.

                    4. 5

                      Yeah. I like this a lot, an extensible crutch until first class JSON support lands in GNU utils.

                      Frankly passing objects via pipes is one of those things that Powershell gets right, and makes writing powershell feel somewhat elegant compared to bash pipes.

                      The jq syntax is a bit difficult for me at times still though; but the good thing about json is that I don’t have to use jq.

                      I think this is super. :)

                      1. 2

                        don’t have to use jq.

                        Here is my (I guess controversial) opinion on jq: https://ilya-sher.org/2018/09/10/jq-is-a-symptom/

                        TL;DR - handling JSON should be in the shell.

                        1. 2

                          I’d say json is a symptom that you haven’t thought through your data model enough to use sqlite.

                          If you want to know what SQL is a symptom of…. read CJ Date. He has decades worth of books worth of rants on the subject.

                          1. 2

                            I’d say json is a symptom that you haven’t thought through your data model enough to use sqlite.

                            I’m not entirely sure what this means. How do you propose to pass data between two processes not on the same machine using sqlite?

                            1. 1

                              An SQLite DB is just a tight text file. If there’s any real structure in your data, it’ll be smaller than a JSON file.

                              JSON is the winner on the web because you have to use other people’s data models, and you can’t expect them to think it through. We don’t want to learn two ways of dealing with data so it’s used when we have full control too, anyways.

                              If you’re gonna integrate structured data in the shell (but somehow don’t want to go full powershell objects), SQLite is probably the better choice.

                              1. 2

                                Can the sqlite API start an in-memory db from a byte stream? Or do you mean to stream the DDL between processes?

                                1. 1

                                  Can the sqlite API start an in-memory db from a byte stream?

                                  SQLite itself AFAIK not, but you can do this: Reading SQL scripts:

                                  cat script.sql | relpipe-in-sql | relpipe-out-tabular
                                  
                                2. 2

                                  It’s less about sql or sqlite itself or json, it’s more about the notion of a relational algrebra that Codd envisioned, and that sql is a stonking Bad implementation of (but the best we have at the moment).

                                  TL;DR; we got so badly betamaxed by SQL that the better standard never even emerged.

                                  1. 1

                                    Sorry, I don’t get it. If I want to send an RPC to a server and read a response. are you suggesting I should construct an entire SQLite database and send that? And then parse the database on response, even if it’s just an object with a key to indicate success/failure and one for the error message?

                                    Also, in general, I’ve found deeply nested data to be much more pleasant in JSON than in any SQL. No need to mess around with joins when I can just access a property on an object directly.

                                    1. 1

                                      No, not really. A DB-first world would have had answers for that but we live in a different world. JSON is better in this world for plenty of reasons.

                          2. 4

                            Not sure if this idea is genius or madness. Fascinating either way. Would be neat to try to just build a Linux system around it.

                            1. 1

                              I claim genius, and I doubly triply applaud the author for getting the conversation started by building a usable tool that accomplishes their aims.

                              I’ve been feeling for years that “everything is a stream of bytes” is still INCREDIBLY powerful but in 2021 not sufficient, so I love steps in the right direction like this.

                            2. 4

                              $ netstat -tln | jc –netstat

                              Here is a request, perhaps this can be better implemented as

                              $ jc netstat -tln

                              This way, you can easily inspect the command line switches passed too, and can support complex switches at least partially. Once the command starts supporting -j you can transparently exec . Another is that, you can add another wrapper to switch to say binary format:

                              $ jsonb jc netstat -tln

                              add compression

                              $ zip jsonb jc netstat -tln

                              This way of doing it has some precedence in the unix land.

                              1. 3

                                One of the example invocations is jc ifconfig ens33 | jq -r '.[].ipv4_addr' so I assume it already does that.

                                1. 1

                                  Thanks!, I missed that.

                              2. 3

                                Cool! And while you’re at it, fix the bloody ARGV is stringly typed random shit for every different command.

                                1. 2

                                  Cool I can see this being very useful! FWIW I think this is the right approach to structured data – use JSON, which is also text. As opposed to .NET objects and so forth.

                                  It was already on my page: https://github.com/oilshell/oil/wiki/Structured-Data-in-Oil

                                  and I just added a few more notes. It is similar to uxy which seems to be dormant: https://github.com/sustrik/uxy

                                  1. 1

                                    This article has me torn between purism and pragmatism. A good read, looking forward to trying jc out.

                                    1. 7

                                      This is probably a matter of the historiography of computer science at this point, but I strongly suspect we got the “unstructured text” interpretation of point ii) in the original AT&T report wrong, and that we should’ve thought of it as arbitrarily structured text instead. Or, well… we probably did think of it that way, most of the time, but we just didn’t articulate it properly?

                                      A lot of the “interconnected tools” examples in early Unix literature are centred around things like statistics, experimental data, word definitions and so on. These aren’t arbitrary dumps of text, they’re all structured, just not in the same way, and usually not semantically (i.e. you use sed in terms of “get the second field”, not “get the IP field”). But the nice thing about writing tools that are tolerant to that is that you get tools that can show data in a format that’s both readable for humans (more or less) and easily processed further by automated tools.

                                      jc is really cool in that it solves the associated problem: the proliferation of data representation formats. If they all show whatever format they want to show, you get as many interfaces as programs, and that tends to get fragile in time. It probably wasn’t that much of a problem in 1978, when the install base of Unix was a few dozen PDP-11s running a handful of special-purpose tools but it’s a whole other story nowadays. jc gives you a way to get predictably-structured data out of many arbitrarily-structured tools. It’s basically a universal adapter.

                                      I vaguely recall other efforts of doing that, but I don’t have any links at hand (and take it with a grain of salt, I may be misremembering it) – using other interfacing formats, like CSV. The fact that this one uses JSON, the favourite format of the web heretics, probably takes some of the purity away, but IMHO this is pretty Unixy :-).

                                      1. 2

                                        Great thoughts which add value to the article, thanks. On reading your first paragraph above, I went back to see the maxims and only noticed that it didn’t actually say anything about structured or unstructured (it did tell us to avoid binary, thank goodness). So I can see the argument for jc being in line with the philosophy, not at a tangent to it.

                                        “JSON, the favourite format of the web heretics” - I think that may catch on :-)

                                    2. 1

                                      I wonder if Tcl literals might not be a bit cleaner, and shell-friendly, than JSON.

                                      We have to be really careful not to get stuck in a local maximum. JSON is indeed better than XML and ASN.1, but that is a really low bar.

                                      1. 1

                                        JSON is not very much like a replacement for ASN.1. It could be used as an encoding substrate for ASN.1 if anyone cared to. The ITU has even specified how to do that in the JER, X.697. This document, from one of the major compiler vendors explains it in a more accessible way and is easier to obtain than the ITU standard.

                                        Using JSON encoding rules for ASN.1 is the moral equivalent of using ASN.1 as a language to specify a schema for JSON, and lets you use ASN.1 tools for validating compliance. So, e.g., you can specify that a particular field in a structure must contain an enumerated type in a standard, and validate that a particular json blob conforms to that expectation before feeding it to your application logic.

                                        The main benefit of using JER that way is that at least the two largest ASN.1 compilers and runtimes support it, and that gets you quite a bit of battle tested tooling. That said, if you have that tooling at hand, you’re probably already well-equipped to handle the more compact binary encodings. And people who like JSON but don’t have those tools in their box probably oppose having a schema enforced in any way like this anyway.

                                        1. 2

                                          [JSON] could be used as an encoding substrate for ASN.1 if anyone cared to.

                                          Yes, I know. IIRC there are rules for encoding ASN.1 in XML too. But I think you know what I meant: JSON the human-readable, schemaless notation is indeed better than XML the incredibly verbose markup language and ASN.1 the extremely intricate IDL and its related encoding rules. Sure, they are all subtly different things, but ‘JSON,’ ‘XML’ and ‘ASN.1’ serve as symbols to refer to three different, competing ways of transferring data.

                                          None of them really works at a shell level, I think. I think Tcl notation would be a real improvement. Here is my rendering of the article’s example:

                                          $ ip -j addr show dev ens33
                                          {
                                              {addr-info {{} {}}}
                                              {
                                                  ifindex 2
                                          	ifname ens33
                                          	flags {BROADCAST MULTICAST UP LOWER-UP}
                                          	mtu 1500
                                          	qdisc fq-codel
                                          	operstate UP
                                          	group default
                                          	txqlen 1000
                                          	link-type ether
                                          	address 00:0c:29:99:45:17
                                          	broadcast ff:ff:ff:ff:ff:ff
                                          	addr-info {
                                          	    {
                                                          family inet
                                          		local 192.168.71.131
                                          		prefixlen 24
                                          		broadcast 192.168.71.255
                                          		scope global
                                          		dynamic true
                                          		label ens33
                                          		valid-life-time 1732
                                          		preferred-life-time 1732
                                          	    }
                                          	    {
                                                          family inet6
                                          		local fe80::20c:29ff:fe99:4517
                                          		prefixlen 64
                                          		scope link
                                          		valid-life-time 4294967295
                                          		preferred-life-time 4294967295
                                                      }
                                                  }
                                              }
                                          }
                                          

                                          I think that compares very favourably. And it is pretty easy to parse, too, cf. https://wiki.tcl-lang.org/page/Dodekalogue

                                          1. 1

                                            Yes, I know. IIRC there are rules for encoding ASN.1 in XML too.

                                            Those XML encoding rules are called XER, for any who aren’t familiar and want to reference. They well pre-date JER and are similar in spirit.

                                            I did not accurately suss out whether you were arguing schemaless > schema or just lobbying for a particular wire format. ‘Cause JSON schemas are a thing almost as much as XML schemas, and IMO if you want to go there the ASN.1 schema-related tooling is dramatically better (though the good stuff isn’t free) anyway.

                                            If you want to roll without a schema, the TCL notation seems a little more human readable than JSON. But I’d argue that if you were going to touch all of these tools to do a radical new output format, you might as well go the whole 9 yards and introduce automateable format validation, too. And if you’re doing that, IMO the ASN.1 notion of a schema is easier to both read and write than either json-schema, xml schema definition, or xml document type definition in addition to having better enforcement tools.

                                      2. 1

                                        Oh, this is a really nice idea.

                                        1. 1

                                          that’s pretty brilliant!

                                          1. 1

                                            Below is how you “integrate” jc with Next Generation Shell.

                                            data = ``jc PROGRAM ARGS ...``
                                            

                                            The double-backtick syntax runs the external program, jc in our case, and parses the output. It means that the “integration” is not jc specific.

                                            data is now structured data that comes from the parsed JSON.

                                            Example (run from your shell):

                                            ngs -pl '``jc ifconfig``.filter({"name": /docker/}).ipv4_addr'
                                            

                                            Will print IPs of all docker interfaces, one IP per line