1. 26
  1.  

  2. 22

    It took me a long time to understand, but I’ve become convinced that the reason this keeps happening is that it’s solving the wrong problem. It’s not serialization that’s hard, it’s schemas.

    I could elaborate at length, but I don’t really feel up to it right now, happy to go into it if people are curious.

    1. 9

      I agree, but I also think there will always be a split between software cultures: those that appreciate supportive tooling vs. those who want simplicity of solution.

      Do you look at the size of the JSON spec and feel joy at its simplicity, or dread that all semantic validation is left out of scope?

      1. 5

        Schemas are hard because of the same reasons that schema changes are hard.

        1. 1

          It’s much simpler to change schema when using a self-describing encoding. For one, you do not need to devise versioning system unless you remove keys.

          Also, there can be a separate validation step like with JSONSchema or XSD. Haskell appears to be able to parse JSON in a type-safe way as well…

          1. 3

            Changing schema when data exists in the old schema requires either a new schema which is compatible with the old data, or changing all the old data. There are ramifications to either decision.

            1. 3

              Any statically typed language can have a type-safe JSON parser, you just need a function of String -> SomeType, there is nothing special about Haskell in this regard.

          2. 4

            I’d probably read it.

          3. 17

            The wikipedia comparison is pretty good, but it’s still lacking. Some things that are important to me in a serialisation protocol:

            • Binary: As soon as you open the door to text you get questions about encoding and whitespace, and it becomes very difficult to process efficiently. The edge cases will haunt you in a way they never will with a binary protocol.
            • Self-describing: It should be possible for a program to read an arbitrary object (unlike XDR, Protocol Buffers, etc)
            • Efficiency: JSON/MessagePack/XML are out, but so is DER (ASN.1) because integers are variable length
            • Explicit references/cycles (e.g. plain JSON, but not Capt’nProto or PHP’s serialize)
            • Lots of types: ASN.1 has the right idea here, but it still falls short.
            • Unambiguous: Fuck MessagePack. Seriously.

            On the subject of types: k/q supports booleans, guids, bytes, shorts(16bit), ints(32bit), longs(64bit), real(32bit), float(64bit), characters, symbols, timestamps, months, dates, timespans (big interval), minutes, seconds, times, all as arrays or as scalars. It also supports enumerated types, plain/untyped lists, and can serialise functions (since the language is functional). None of the blog-poster’s suggestions can stand up to the kdb ipc/protocol, so clearly we needed at least one more protocol, but now what?

            Something else I’m thinking about are capabilities/verified cookies. I don’t know if these can/should be encoded into the IPC (I tried this for a while in my ad server), but there was a clear advantage in having the protocol decoder abort early, so maybe a semantic layer should exist where the programmer can resolve references or cookies (however if you do it as a separate pass of the data, you’ll have efficiency problems again).

            I think that if you can get away with an existing format, you should use it because you get to inherit all of the tooling that goes with it, but dogma that suggests serialisation is a solved problem is completely and obviously wrong.

            1. 9

              Cycles are hard to use safely. In my opinion, it’s much better to encode them explicitly when you need them (not that often) than to include them in the format itself. Other than that, I agree completely.

              It is also important for the format to have a canonical form for when you deal with cryptography. Also, not having various lengths of numeric data types that are all treated differently is a great boon for current scripting languages.

              Have you seen RFC 7049: Concise Binary Object Representation (CBOR)? It has JSON semantics with additional support for chunked transfers and sub-typing (interpret this string as a date or something).

              RFC 8152: CBOR Object Signing and Encryption (COSE) also sounds promising. I believe that it’s time for DER and the whole ASN.1 world to go.

              1. 7

                I was surprised not to see CBOR in the list of formats, it’s actually an incredibly elegant encoding which can be efficiently decoded and provides a huge amount of flexibility at the same time (and is sensible enough to leave space to add more things if they become necessary). Haskell’s serialise and cborg libraries have adopted it, and I hope these will become the canonical serialisation format for Haskell data, replacing the really ad hoc and less efficient encoding currently offered by the binary and cereal packages.

                CBOR is a protocol done right, with standardisation and even IANA registry for tags and other stuff. It’s also part of the COAP REST-for-IoT-but-efficient standard (thing - I’m not familiar enough with exactly what COAP does).

                Edit: video describing the protocol and how it’s likely to be used in Haskell https://youtu.be/60gUaOuZZsE

                1. 0

                  What do you mean by “JSON semantics”? JSON has really terrible semantics, especially around numbers.

                  1. 5

                    I should have said JSON-compatible semantics.

                    Most of the types in CBOR have direct analogs in JSON. All JSON values, once decoded, directly map into one or more CBOR values.

                    The conversion from CBOR to JSON is lossy. CBOR supports limited-size integers, arbitrary-precision integers and floats. It also has support for NaN and Infinities.

                    1. 1

                      By this definition, is there any serialization format that is not JSON-compatible?

                2. 1

                  Always great to see another k programmer around. Been several years but man that was a trip.

                3. 15

                  “Please stop doing X” posts are a dime a dozen and rarely offer actionable advice more prescriptive than the simple “stop doing this thing.”

                  Let’s get more posts that say, “Before doing creating something new, here’s how to evaluate what exists for your use case.”

                  1. 2

                    Agreed. There’s no concrete information in this post, and a lot of uncited or made up narrative. And it’s condescending. I expect more from posts on Lobsters.

                    1. 2

                      Please stop writing new serialization protocols

                      also, why does the author feel the need to call people monkeys?

                  2. 7

                    There really should be a one size fits all minimal serialization protocol

                    I’m not sure that makes any sense. Serialization is just the least-interesting part of the protocol or format. Just because I can “decode” you data in some abstract way because it used a serialization I know about, doesn’t mean I can do anything useful with that data. To do that I need code that actually understands the data

                    1. 2

                      Avro has a built in protocol schema that includes a list of all possible messages you can send, and all possible return types.

                    2. 3

                      CapnProto (because he didn’t get it right the first time)

                      That’s a pretty good reason to do it again.

                      1. 5

                        Please stop writing “please stop X” blog posts.