1. 22

  2. 4

    What are lobsters’ thoughts on XML by the way? As a younger person looking back in this age of JSON, it seems like it actually had a bunch of neat ideas, wrapped up in a terrible syntax and some excessive feature bloat. Is there any work being done on salvaging some of it, or it it kind of a lost a cause?

    1. 8

      Aside from the awful syntax, XML has terminal featuritis (and, as an immediate consequence, a bunch of useful features; they’re just mixed in with a bunch of less useful ones). Simpler formats are usually better. Personally, in the absence of other requirements, for human-readable data, I’d use JSON; for human-writeable, YAML (though it has its own featuritis issues); for human-unreadable (i.e., binary), protobuf, though there’s actually been a relative proliferation of these lately (others to consider: Cap’n Proto, Avro).

      At this point, I’d drop XML other than for backwards compatibility. There are enough widely-supported alternatives which hit a useful subset of its features that I don’t expect anyone to put a lot of effort into making a “better” XML.

      1. 1

        If JSON or YAML will suffice, then XML was probably a terrible tool for the job. But I haven’t found a good alternative to XMLNS yet

      2. 6

        There are some good ideas in the XML ecosystem, but IMO the problem with it is that it doesn’t map to most programming language data structures. JavaScript, PHP, Python, Perl, Ruby all essentially have the JSON data model – dynamically typed dicts, lists, strings, numbers, booleans. JSON is the lowest common denominator between them all.

        Between those 5 languages, that’s probably 95%+ web apps, so you can see why JSON is a better fit than XML for communicating structured data between processes (not to mention that one side of the wire is usually JavaScript).

        The syntax of XML isn’t terrible; it’s only terrible if you use it for the wrong thing. The syntax of JSON is terrible if you’re say writing HTML with it:

        { "tag": "p"
          "content": ["This is my paragraph with text in ", {"tag": "b", "content": "bold" }, "\n"  ]

        That is a horrible syntax for a document, just like XML is a bad syntax for structured data. I think people tend to overthink this. Use XML when you need to annotate text with attributes; Use JSON when you have structured data.

        I haven’t used XML lately but I imagine it’s still good for book toolchains and so forth. I think there is a habit of overengineering those kinds of tools though. I use HTML a lot these days and it works well.

        Historically people DID try to abuse XML into the role of JSON, e.g. for database dumps (which JSON isn’t even great at.) But people learned a lesson I suppose. There is a tendency to try to make a particular technology “universal”, and apply it to domains where it doesn’t fit.

        1. 4

          I’ve commented here a bunch on the topic of self-describing data in general, either to explain my view that the real problem is not the data format, but rather the schema… or to plug my incomplete project Modern Data, which I conceived of in a manic episode, so I make no apology for its grand scope, but I’m not sure when if ever I’ll have time to finish it…

          The project will someday, maybe be a self-describing schema format for dependently typed object graphs. To be useful it needs not only the basic typechecker, serializers, and deserializers, but also tools like diff and structure editors… The idea is that you could write a Modern Data schema for any existing file format (regardless of whether it’s based on something like XML or JSON, or it’s a low-level format more like PNG), and then gain the full benefit of the tooling ecosystem.

          It’s the kind of thing that I think many people would use if it were mature, but it’s challenging to get people interested in building it before it is.

          Anyway, the real problem is schemas. :) Every major serialization format I’m aware of has had major controversy over whether and how to support schemas, including formats such as JSON which originally tried to exist without any such support - it’s important enough that people tend to invent schema formats if they aren’t provided. There’s often a desire to make the schema format itself be something not-fully-general which maps naturally onto the kinds of data people want to represent in the underlying format. This is motivated by valid and important concerns about tooling support, but it never seems to actually get to the point where it meets everyone’s needs…

          And, of course, most self-describing data formats are more-or-less trees. That means they can’t handle certain situations where performance is important; for that you need arbitrary graphs, often implemented through some sort of intra-file pointer.

          1. 1

            This sounds super interesting, have you written about it more anywhere else?

            1. 2

              Kind of, but I’ve never put together a good write-up of the motivation. See https://github.com/IreneKnapp/modern-data, and also somebody I met here once started an effort to redo the documentation, over at https://github.com/tinyplasticgreyknight/modern-docs.

          2. 3

            JSON and XML solve totally different problems. You can abuse one to do what the other does, but that way lies pain.

            JSON is an encoding for some common data structures and types: map, list, string, number, boolean

            XML has none of these, but is a way to graft different kinds of data together in a single document such that parsers can use the parts they understand, and ignore the parts they don’t. Namespaces are the “eXtensible” part.

            1. 2

              It’s a markup language and it’s not terrible at markup.