You may also be interested in the ludicrous jq. There is also jqplay, which lets you try out the jq language online.
Below is the equivalent jq demo for “get all the front page links on Reddit”. The join("\n") and --raw-output format the JSON list into raw lines suitable for xargs.
curl -s 'https://www.reddit.com/.json' | jq --raw-output '.data.children | map(.data.url) | join("\n")'
This was directly inspired by how silly I thought jq was. Ironically enough, the two work well together.
Out of curiousity, why do you think jq is silly? I quite like it.
It lives in its own universe with its own pipelines, instead of relying on the tools you already use.
I mean, yes, but jq ‘a | b’ is the same as jq ‘a’ | jq ‘b’, only with less parsing and serializing in the middle. This doesn’t seem like a major problem.
[Comment removed by author]
For instance, jq ‘a’ | jq ‘b’ gives you “free” parallelism at the OS process level, as long as jq streams.
just to waste it on parsing and serializing.
at least for the bigger json files i process this is a major factor.
‘the tools I already use’ (grep, cut, sed, awk) typically work in two modes: “character delimited columns and rows” or “crazy regex parsing.” none of them work well on recursive tree data (which most JSON feeds are). most of the time I want something like “go three levels deep, pull out these two keys, and output them as a dict.” jq ends up being the best solution for this workflow (for me)
I’ve worked with largish json datasets, several terabytes. JSON parsing is really expensive. Without json, processing a million messages/s is pretty easy. With json, it’s tough to beat 100k/s (with say an i7). Also the standard unix tools don’t deal well with nested data.
This looks pretty useful! Did you consider using the JSON Pointer notation for the key output? It’s not without its issues but it is “standard”.
Huh, I’ve never seen this RFC before.
You could easily change the separator to “/” in the current implementation. However, there is not (currently) an option to remove the wrapper around the keys. I think that the wrapper makes it all more easily grep-able. The RFC also does not address array indices, which makes things a little challenging.
However, adding the ability to remove the wrappers would be a simple enough change to support this kind of format if one chose.