1. 23
  1.  

  2. 4

    Looks awesome. And given the looseness of the JSON spec, I won’t quibble about ‘correctness’…. but is it secure?

    I see they do some fuzzing, but including the minefield suite sure wouldn’t hurt.

    1. 4

      If parsing JSON is the bulk of the work of a server, maybe reorganizing the application can be a more radical approach :

      • splitting the software at a different place (if in control of the client and server exchanging JSON)
      • avoiding exchanging data at all
      • exchange raw data streams (is your JSON looking like [ 1, 3, 98, 1, ... ] ?)
      • use more lightweight format specific to the task (like https://www.brandur.org/logfmt or just a plaintext)
      • make use of the underlying protocol (often used with HTTP, which has verbs a path and query parameters, which sometimes suffice entirely to describe your data) do you return { "success": false, "code": 500 } to return an error? Do you send a POST with only { "collection": "all" }?
      • avoiding wrapping JSON onto base64 (which can go multiple level deep given the omnipresence of JWT) or JSON onto { "header" { "name": "test" }, { "payload": [actual JSON content] } }

      On the other hand, if your data looks like a key-value store with some of the values that are arrays of key-value store, inventing another non-trivial format specific to the task is asking everyone using the software to learn yet another format for no big reason.

      1. 3

        I know of one company that does indeed send binary PDFs as JSON arrays of numbers in [0, 255]. It looks horrendously inefficient but really it’s fine for their use case, because their API when called will enqueue a job to have a human being stuff a printout of the PDF you sent it into an envelope and mail it for you. That’s such an expensive operation that the inefficient data transfer is a rounding error. :)

        1. 1

          Haha, finally good candidate for experimenting IPoAC (RFC1149).

          That reminds me of that Knuth quote: “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%”

      2. 3

        The python bindings look nice. Drop in replacement with good speed. Additional API for more speed. Seems comparable to urjson without the interface change. (Urjson works in bytes.) Seems like something to check out.

        https://github.com/TkTech/pysimdjson

        1. 4

          Thanks for the mention. The library is still a work in progress, with C-land filtering coming soon (think jq). Would always love feedback and criticism :)

          1. 2

            This is super exciting! Its native API is exactly what I have been looking for — a way to inspect a specific field of a document without converting the whole thing into Python objects. Thank you for building this.

        2. 3

          https://lemire.me/blog/

          Daniel Lemire’s blog is worth following. He seems to come up with practical perf improvements often.