I can always tell someone has probably not used XML for anything it is good at when they compare it to JSON. XML is all about extensibility (namespaces, mostly) and JSON is all about serialization of basic types. They do fundamentally different jobs.
JSONs entry point into the programming space was by replacing XML first. On the web, even by being transferred through an interface called XMLHttpRequest at first. For configuration, XML was also an accepted standard for a while. From a cultural perspective, this comparison is very fine.
I think my point is that JSON can only “replace XML” when the XML in question was super degenerate and should never have been XML in the first place. Comparing them in this degenerate case can be interesting, I suppose, but keeping them as something comparable in people’s minds prevents seeing them as the very different things that they are.
I spent a lot of years not understanding why XML existed. When Atom came out I actually thought their use of namespaces was “getting in my way” because the XML parsing library I used was utter garbage. It took a long time for me to get past the JSON comparison mindset.
But the post goes to great ways of explaining how XML is a much more richer language with a wider and standardised tooling like XPath and schemata, so I don’t know where the impression comes from that they compare that.
While I hate using XML for config files or other human readable documents, I’ve been a big fan of using XML as an RPC serialization format (or as a way to interact with REST APIs). It’s easy to construct through string concatenation, it’s fairly easy to whip up a quick parser, and there’s tons of high quality, fast implementations out there. Along with schematization it makes it fast and easy to send/verify XML payloads.
I hate XML plenty but I keep finding myself and people I work with reinventing basic features of XML like comments or namespaces or query languages on top of our JSON configuration files. Or people try to use TOML or YAML which become harder to understand or reason about as the complexity increases.
I don’t have an answer. It’s just an observation. We threw out the baby with the bathwater.
Along with schematization it makes it fast and easy to send/verify XML payloads.
This is the big win for me. You can pass a set on XML schemas to any business partner and they can quickly and generically validate the message on any platform. And with facets and comments, the meaning and properties of the message can be conveyed implicitly and in great detail.
I really have trouble understanding this. Why use XML for serialization, especially in RPC or anything going over the network? It’s ludicrously inefficient for that (json is, too, but slightly less so). Just do yourself a favor and pick msgpack/cbor/bencode/protobuf/… or anything really that doesn’t require complicated escaping of the payload. If you want something easy to parse, bencode is much easier than XML anyway.
In terms of verbosity, transport encoding (gzip or whatever) probably gets rid of most of the difference. The great thing about XML is that a lot has been invested in efficient implementations of encoders and decoders. Theoretically others could be more performant but are they? And there’s a proliferation of different XML codec implementations - do you want a DOM interface or streaming or something that maps to native objects? Being old and popular has a lot of upsides.
XML is useful in this case when both of the following are true:
The sender and receiver are different organizations
The payload is more like a document than a serialized data structure
In these cases, an XML schema of one sort or another is very useful for keeping both sides “honest.” The encodings you mention are not typically all that extensible, so you wind up versioning your data structures. You do more work up-front with the XML to save yourself some pain as the years drag on. The pain isn’t worth it if your data structures are small and simple. But sometimes you have one or many external parties that want to do data interchange with you, and defining a common schema in XML gives you a lingua franca that is both richer and harder to screw up than IDL-like binary encodings or ad-hoc JSON or its binary analogs.
It may seem like this never happens, but it may be that there is a document-like object being served out piecemeal by a family of nested REST APIs. If the REST calls are almost always performed in a certain order (get the main thing, get the pieces of the thing, get the pieces of the pieces…) then efficiency might be improved by just doing one call to get the complex thing. You might be able to improve the robustness of the handling on both sides by using XML in cases like that because it’s just easier to extend it without changing the shape in a way that will break the existing parsers.
All this said, if I had my druthers, I’d still probably use XML for a new system once or twice a year, versus using REST+JSON on a weekly basis.
That’s a good point, thanks. XML makes a lot of sense for content that is more document-like. Someone on IRC mentioned DocBook as an example where XML is adequate.
For REST API’s, I would just use JSON. Sure, the format itself is inefficient, but if you’re using the REST API from inside a web browser (and if you expect other people to use this API, then you ought to be using it yourself) it’s hard to beat the efficiency of having a JSON codec already included.
You might be able to design your server to use HTTP content negotiation to simultaneously support JSON and Msgpack. Their data models are pretty similar.
It’s ludicrously inefficient for that (json is, too, but slightly less so). Just do yourself a favor and pick msgpack/cbor/bencode/protobuf/
Have you done any measurements to come to this conclusion? Especially compared to using EXI envelopes. SOAP is standardized and widespread, so you’d need a very good reason to use anything else.
When you get into fields like HPC, where RPC performance actually matters, you don’t actually use any of these formats.
Indeed not, I didn’t know about EXI. Is that… a binary encoding for XML?! It seems less inefficient indeed. But also note how a lot of “modern” RPC is done via thrift, gRPC, finagle, etc. all of which rely on underlying binary encodings to be efficient. And even then they try to optimize for variable length integers and 0-copy decoding.
I can’t even articulate my point properly. In big companies using SOAP, I’m sure there’s tons of good tooling around XML. But if you’re not already using it, it seems to have very little appeal for RPC compared to, say, thrift. Thrift will be faster, smaller on the wire, and also comes with a schema.
The point is that you need to justify, using actual numbers, why picking anything other that the established standard (SOAP) is a good idea. Not using SOAP smacks of junior dev-ness. SOAP and XML are going to be around way longer than whatever flavour of the month that always crops up in threads like these.
If EXI is not enough, I’m sure someone has figured out how to use ASN.1 with SOAP. This would enable using for example uPER as the wire format.
XML absolutely has it’s uses and places, but its reputation is still suffering/recovering from the industry trying to jam it into every space imaginable. It has a similar problem that Java has, as the first decade or so of Java’s life it was the magic bullet that we should use everywhere.
Both have had recoveries in their reputations, but they both still have a way to go.
He didn’t list my primary gripe with xml, and that is that schemas, especially nesting schemas, is absurdly difficult. My leads always looked sideways at me when I insisted on parsing xml with a schema, and it slowly I came to agree with them as all code used annotations to manage transformations. while schemas just produced bugs. The more strict the schema, the more useless it becomes.
I think these discussions would be a lot cleaner if people could debate XML separately from debating schema separately from doctypes/validation separately from XPath. These are obviously related tech and all vaguely “XML” but you don’t have to buy all of them to buy one of them. XML with namespaces and no schema or doctype or xpath or anything is very useful in its own right and is what I try to mean when I say “XML”.
TIL you can process JSON with XSLT. I guess I never really wondered if it were even possible. Neat-o. Not sure it’ll come up in production, but it makes me want to brush off that XSLT skilltree.
This section describes facilities allowing JSON data to be processed using XSLT.
I can always tell someone has probably not used XML for anything it is good at when they compare it to JSON. XML is all about extensibility (namespaces, mostly) and JSON is all about serialization of basic types. They do fundamentally different jobs.
JSONs entry point into the programming space was by replacing XML first. On the web, even by being transferred through an interface called XMLHttpRequest at first. For configuration, XML was also an accepted standard for a while. From a cultural perspective, this comparison is very fine.
I think my point is that JSON can only “replace XML” when the XML in question was super degenerate and should never have been XML in the first place. Comparing them in this degenerate case can be interesting, I suppose, but keeping them as something comparable in people’s minds prevents seeing them as the very different things that they are.
I spent a lot of years not understanding why XML existed. When Atom came out I actually thought their use of namespaces was “getting in my way” because the XML parsing library I used was utter garbage. It took a long time for me to get past the JSON comparison mindset.
But the post goes to great ways of explaining how XML is a much more richer language with a wider and standardised tooling like XPath and schemata, so I don’t know where the impression comes from that they compare that.
Or maybe they want to draw the parallel, since everyone know JSON, using it at the places where XML has been used?
While I hate using XML for config files or other human readable documents, I’ve been a big fan of using XML as an RPC serialization format (or as a way to interact with REST APIs). It’s easy to construct through string concatenation, it’s fairly easy to whip up a quick parser, and there’s tons of high quality, fast implementations out there. Along with schematization it makes it fast and easy to send/verify XML payloads.
I hate XML plenty but I keep finding myself and people I work with reinventing basic features of XML like comments or namespaces or query languages on top of our JSON configuration files. Or people try to use TOML or YAML which become harder to understand or reason about as the complexity increases.
I don’t have an answer. It’s just an observation. We threw out the baby with the bathwater.
This is the big win for me. You can pass a set on XML schemas to any business partner and they can quickly and generically validate the message on any platform. And with facets and comments, the meaning and properties of the message can be conveyed implicitly and in great detail.
I really have trouble understanding this. Why use XML for serialization, especially in RPC or anything going over the network? It’s ludicrously inefficient for that (json is, too, but slightly less so). Just do yourself a favor and pick msgpack/cbor/bencode/protobuf/… or anything really that doesn’t require complicated escaping of the payload. If you want something easy to parse, bencode is much easier than XML anyway.
In terms of verbosity, transport encoding (gzip or whatever) probably gets rid of most of the difference. The great thing about XML is that a lot has been invested in efficient implementations of encoders and decoders. Theoretically others could be more performant but are they? And there’s a proliferation of different XML codec implementations - do you want a DOM interface or streaming or something that maps to native objects? Being old and popular has a lot of upsides.
XML is useful in this case when both of the following are true:
In these cases, an XML schema of one sort or another is very useful for keeping both sides “honest.” The encodings you mention are not typically all that extensible, so you wind up versioning your data structures. You do more work up-front with the XML to save yourself some pain as the years drag on. The pain isn’t worth it if your data structures are small and simple. But sometimes you have one or many external parties that want to do data interchange with you, and defining a common schema in XML gives you a lingua franca that is both richer and harder to screw up than IDL-like binary encodings or ad-hoc JSON or its binary analogs.
It may seem like this never happens, but it may be that there is a document-like object being served out piecemeal by a family of nested REST APIs. If the REST calls are almost always performed in a certain order (get the main thing, get the pieces of the thing, get the pieces of the pieces…) then efficiency might be improved by just doing one call to get the complex thing. You might be able to improve the robustness of the handling on both sides by using XML in cases like that because it’s just easier to extend it without changing the shape in a way that will break the existing parsers.
All this said, if I had my druthers, I’d still probably use XML for a new system once or twice a year, versus using REST+JSON on a weekly basis.
That’s a good point, thanks. XML makes a lot of sense for content that is more document-like. Someone on IRC mentioned DocBook as an example where XML is adequate.
For REST API’s, I would just use JSON. Sure, the format itself is inefficient, but if you’re using the REST API from inside a web browser (and if you expect other people to use this API, then you ought to be using it yourself) it’s hard to beat the efficiency of having a JSON codec already included.
You might be able to design your server to use HTTP content negotiation to simultaneously support JSON and Msgpack. Their data models are pretty similar.
Have you done any measurements to come to this conclusion? Especially compared to using EXI envelopes. SOAP is standardized and widespread, so you’d need a very good reason to use anything else.
When you get into fields like HPC, where RPC performance actually matters, you don’t actually use any of these formats.
Indeed not, I didn’t know about EXI. Is that… a binary encoding for XML?! It seems less inefficient indeed. But also note how a lot of “modern” RPC is done via thrift, gRPC, finagle, etc. all of which rely on underlying binary encodings to be efficient. And even then they try to optimize for variable length integers and 0-copy decoding.
I can’t even articulate my point properly. In big companies using SOAP, I’m sure there’s tons of good tooling around XML. But if you’re not already using it, it seems to have very little appeal for RPC compared to, say, thrift. Thrift will be faster, smaller on the wire, and also comes with a schema.
The point is that you need to justify, using actual numbers, why picking anything other that the established standard (SOAP) is a good idea. Not using SOAP smacks of junior dev-ness. SOAP and XML are going to be around way longer than whatever flavour of the month that always crops up in threads like these.
If EXI is not enough, I’m sure someone has figured out how to use ASN.1 with SOAP. This would enable using for example uPER as the wire format.
XML absolutely has it’s uses and places, but its reputation is still suffering/recovering from the industry trying to jam it into every space imaginable. It has a similar problem that Java has, as the first decade or so of Java’s life it was the magic bullet that we should use everywhere.
Both have had recoveries in their reputations, but they both still have a way to go.
Wow. Great way to weaken one’s argument by incorporating completely unnecessary cheap shots.
He didn’t list my primary gripe with xml, and that is that schemas, especially nesting schemas, is absurdly difficult. My leads always looked sideways at me when I insisted on parsing xml with a schema, and it slowly I came to agree with them as all code used annotations to manage transformations. while schemas just produced bugs. The more strict the schema, the more useless it becomes.
I think these discussions would be a lot cleaner if people could debate XML separately from debating schema separately from doctypes/validation separately from XPath. These are obviously related tech and all vaguely “XML” but you don’t have to buy all of them to buy one of them. XML with namespaces and no schema or doctype or xpath or anything is very useful in its own right and is what I try to mean when I say “XML”.
TIL you can process JSON with XSLT. I guess I never really wondered if it were even possible. Neat-o. Not sure it’ll come up in production, but it makes me want to brush off that XSLT skilltree.