1. 33
  1. 13

    To date, the only XML schemas I have seen which I would actually consider a good use of XML are XHTML and DocBook.

    There are some others! Just not in tech: you want to be looking at things like digital humanities, library science, archiving. Things like MARC XML or EAD3, where the entire point is text with metadata.

    1. 7

      While technically true, it was (and largely still is) the only format with ready to use tools for automated schema verification and transformation.

      One case study I can offer: in VyOS we originally used configuration command definition format inherited from the pre-fork Vyatta system—deceptively simple-looking but poorly designed and unspecified. No one could be sure if their definitions are correct before building a package and trying it.

      I made an XML layer on top of it until we have a new config backend ready. Now they look like this: vrrp.xml. Definitely not a “document”, but the RelaxNG schema and a build step that checks definitions against it makes a big difference for the “developer experience”—syntactically invalid definitions fail the build. Everyone can easily find out what a correct definition is, too.

      JSON schema is promising, but the tools are just getting usable and popular, while tools for XML had been working fine for decades.

      1. 7

        Interesting piece, but sadly it doesn’t say why XML is a markup language, not a data format. Of course, it’s in the name, but I don’t think that means much. It’s (maybe) a statement about what the creators hoped for, but not what the tool is actually useful for.

        1. 12

          Yes, the punchlines of the document seem to initially rotate around intentionalism: ie you should not use this because it was not intended for that.

          It’s only in the last few paragraphs that the author talks about consequentialism: ie you should not use this because it causes these problems + there are better alternatives.

          I think it’s a bit back to front – to me the “bad things happen” arguments are much more important than the “not be used as originally intended” arguments. Using things in unintended ways is not inherently good or bad, but if the outcomes are bad then that is a concrete point to argue.

          (Otherwise: nice writeup, enjoyed reading it :)

        2. 3

          So what should people have used instead?

          1. 2

            S-expressions for data encoding. Seriously.

            1. 1

              There’s plenty of standards to choose from:

              https://en.wikipedia.org/wiki/Comparison_of_data-serialization_formats

              1. 2

                The table in that article doesn’t list the dates from when they were conceived but from eyeballing many of them are newer than xml. So they were not an option at xml’s inception.

              2. 1

                For structured data? JSON/CBOR/Msgpack is probably fine. The author mentioned JSON which is good enough most of the time.

                1. 3

                  The author was saying that people made the wrong choice with xml right from the start. JSON (etc) weren’t invented then. My question was what those folks should have done.

                    1. 1

                      How does netstrings handle Unicode data?

                      1. 1

                        Any string of 8-bit bytes may be encoded as [len]":"[string]",".

                        It sounds like that means netstrings are an 8-bit safe transport for arbitrary binary data. That means that, while the netstring spec itself prescribes no encoding for its data payloads, it can transport UTF-8 with no problems.

                    2. 1

                      Oh. I have no clue.