I don’t understand what the practical difference is between this and jsonl? If you split input on ‘\n’, then throw each value at a JSON parser, it’ll work for either Windows or UNIX line endings; the JSON parser will just ignore trailing ‘\r’ characters as they’re considered whitespace.
Not supporting ‘\r\n’ line endings would require more special consideration, and you’d have to specify something like, “each \n-separated line must be valid JSON, except the last character of the line must not be \r” (or, as ndjson does, add special rules around where whitespace is legal). Surely it’s simpler and better for everyone to just say, “each \n-separated line must be valid JSON”, which gives you \r\n support for free?
At best, ndjson might be a sensible formalisation of what a system emits (though even then, I’d want something more strict which prohibits whitespace outside of string literals entirely). But in terms of what to accept, I would need to see a very good reason to stray away from the simplicity of, “input is split on \n, each value must be valid JSON”.
ndjson: Uses a newline character (\n) to separate each JSON object, and no whitespace is allowed between objects or values. Example: {"some":"thing\n"}
No, the example in the post is not valid JSON, and neither are your first two examples in the reply. JSON does not allow newline characters inside strings.
(It’s also weird that in the reply, the “Out” lines show that the newline has been replaced with a comma!)
I find the existence of both ndjson and json-nl deeply confusing. In the spirit of XKCD 927 I wish someone would produce a standard which encompasses both of them, so I can use one format and feel confident that parsers for both of those older things will handle in the same.
Since they have a \n v.s. \r\n thing going on this probably isn’t possible.
I don’t understand what the practical difference is between this and jsonl? If you split input on ‘\n’, then throw each value at a JSON parser, it’ll work for either Windows or UNIX line endings; the JSON parser will just ignore trailing ‘\r’ characters as they’re considered whitespace.
Not supporting ‘\r\n’ line endings would require more special consideration, and you’d have to specify something like, “each \n-separated line must be valid JSON, except the last character of the line must not be \r” (or, as ndjson does, add special rules around where whitespace is legal). Surely it’s simpler and better for everyone to just say, “each \n-separated line must be valid JSON”, which gives you \r\n support for free?
At best, ndjson might be a sensible formalisation of what a system emits (though even then, I’d want something more strict which prohibits whitespace outside of string literals entirely). But in terms of what to accept, I would need to see a very good reason to stray away from the simplicity of, “input is split on \n, each value must be valid JSON”.
The descriptions of ndjson and jsonl on this web page don’t match my understanding of the canonical definitions, https://github.com/ndjson/ndjson-spec and https://jsonlines.org/
As far as I can tell the differences are:
Note: the ndjson domain expired at some point and is now squatted by a spammer, so don’t click the homepage links from the ndjson github pages.
The example should be
{"some":"thing"}\n, right?Both should be valid right?
Here are some examples
It’s technically valid but the example does not show the newline character separating “each JSON object”
No, the example in the post is not valid JSON, and neither are your first two examples in the reply. JSON does not allow newline characters inside strings.
(It’s also weird that in the reply, the “Out” lines show that the newline has been replaced with a comma!)
I find the existence of both ndjson and json-nl deeply confusing. In the spirit of XKCD 927 I wish someone would produce a standard which encompasses both of them, so I can use one format and feel confident that parsers for both of those older things will handle in the same.
Since they have a
\nv.s.\r\nthing going on this probably isn’t possible.iirc ollama sends streaming chat replies as ndjson
That is exactly what the article is about, right?
Oops lol I didn’t read the article. You’ve caught me red handed
I don’t know when it started but we’ve been using ndjson for many years at $dayjob as it’s compatible with both Azure IoT Hub and Apache Spark.
Funny timing: I came across this older link about using ndjson in Go just last week.
Never heard of ndjson before, but jsonl is an old friend.