I got quite exited about “supports a range of formats” because I use both jq and yq (badly) regularly, and was hoping I could replace both with a single tool. Alas, YAML is not one of the supported formats, so I’d still be stuck with two tools—and even more different ones.
I like articles like this. Who knows how much traction this will gain but that’s not really the point, is it? This team had a need, the existing solutions didn’t fit that need, so they built their own, borrowing the good ideas from the tools they tried.
Having said that, while I don’t like jq’s syntax, I’m also not a big fan of sql’s either hah. Words like over don’t feel natural to me and I worry that the overlap with sql will lead to a different confusion than the confusion of jq’s APL-like syntax. Maybe I’m wrong tho and should try this out first!
I don’t need to mess with large JSON data often, but I’d be tempted to use something like datasette’s sqlite-utils which converts JSON into a sqlite DB and then you can do whatever you want and have a nice reliable format, that won’t go anywhere and the tooling really opens up, from using datasette directly, or SQL queries directly(since SQLite supports JSON directly now), etc.
i.e. The overall goal is converting large JSON blobs into something easy to consume and then have fun using your normal tools. I’m sure one could use PG or MySQL or whatever database you already use a lot of. SQLite is just my personal fav.
One of the better examples of this is probably gron: https://github.com/tomnomnom/gron
You just made my life better! gron is awesome! I had to read the help(which is also amazing), -s is very useful for dealing with json log files. I think I now like this better than my method!
Maybe you would like this: Reading and querying JSON, YAML, CBOR, HTML, MIME, INI, ASN.1 and XML in a uniform way
For filtering/transformations you can use various languages including SQL, AWK, Scheme…
Interesting! Thanks for sharing.
Like NoahTheDuke, I also like articles like this, that show and lay out the thinking behind the conclusion, and along the way, impart knowledge about the topic at hand. So I enjoyed reading this very much. Before I continue, my disclaimer is that I’m on the learning journey with jq and am by no means an expert.
While reading the article a couple of things struck me.
As well as not really having heard of the phrase “stateless dataflow” before (and googling it only elicited a small number of results) I sort of grokked what was meant. However, I think one thing that is worth thinking about for a second is that jq is designed to operate on discrete JSON values. From the jq manual: “The input to jq is parsed as a sequence of whitespace-separated JSON values which are passed through the provided filter one at a time.”. So this [1,2,3] [4,5,6] example in the article is not valid JSON. It’s two valid JSON values, one after the other. At least for me, this partially contextualises the computational model and helps it make more sense to me.
Moreover, the solution to adding the numbers up in the example above, which is given as jq -s '[. | add] | add', is perhaps a little contrived. To help the mental model, the syntactic sugar of map helps a lot, not only to reduce noise, but also to relate the computation to an arguably well-known function (map). So the solution turns into a much simpler jq -s 'map(add) | add'.
jq -s '[. | add] | add'
jq -s 'map(add) | add'
Anyway, this should not detract from the article nor from their conclusions with zq - more power to them!