The working group became hostile to Web browser implementors, even those implementors who were trying to move the Web forward. They discouraged those implementors from participating, and didn’t think about what might get those implementors back into their fold.
This showed when they didn’t care whether it was possible to implement XHTML2 in the same software that already implemented other Web technologies. For example, they required different processing for “new” documents, but provided no mechanism to distinguish them from “old” ones. (And they did this twice, once with XHTML1 and again with XHTML2.)
The W3C seemed to have abandoned the notion that the web could move forward without breaking what we already know and use. Standards had been a lie. The sky was falling.
One day I realized XHTML 2 was not coming soon to a browser near me.
Let’s not even talk about the dropping of img and applet, the deprecation of br, and the proposed deprecation of h1 through h6. Oh, and forms, which are now XForms.
I know, I know, XHTML 2.0 isn’t meant to be backwardly compatible. But damn it, I’ve done everything the W3C has ever recommended. I migrated to CSS because they told me it would work better with the browsers and handheld devices of the future, then the browsers and handheld devices of the future came out and my site looked like shit. I migrated to XHTML 1.1 because they told me to use the latest standards available, and it bought me absolutely nothing except some MIME type headaches and (I am not making this up) Javascript incompatibilities. I migrated to semantic markup that has been around for 10 fucking years and they go and drop it. Not deprecate it slowly over time, mind you, but just fucking drop it. Which means that, after keeping up with all the latest standards, painstakingly marking up all my content, and validating every last page on my site, I’m still stuck in a dead end.
Speaking personally as someone who was around and actively followed the web standards community in the early-to-mid 00s, I think nowadays we forget just how much trouble W3C’s XHTML really was. It was never particularly friendly to implementations or to document authors, at times was downright hostile to them, and most real-world use of “XHTML” only worked because browsers provided the fallback option of not actually enforcing XML semantics on you if you served XHTML as text/html. Take Evan Goer’s infamous “XHTML 100” survey, for example. This involved subjecting web sites of 100 well-known (at the time) web designers/developers, people who presumably had the necessary knowledge and skills to use XHTML, to three tests:
Does their home page validate as XHTML?
Do three arbitrarily-chosen other pages of their site validate as XHTML?
To user-agents which support it, do they serve XHTML as application/xhtml+xml?
Of those 100 sites, 88 failed the first test; 18 passed but failed the second test; and only one site passed all three. Choice quote:
As for the rest, the results speak for themselves. Even among the elite of the elite, the savviest of the savvy, adherence to standards is pretty low. Note that this survey most likely overestimates adherence to XHTML standards, since you would expect the Alpha Geeks to rate high on XHTML standards comprehension.
One of the major issues with XHTML in practice was its requirement for strict error handling, which it inherited from XML and which was generally observed by the browsers. If an XHTML web page didn’t validate, the browser showed you only an error message. As reported in other comments, most ‘XHTML’ web pages didn’t (and were saved from this draconian fate only because they were served in a way that caused browsers to interpret them as HTML instead of XHTML).
The direct problem with this is that it is both a terrible user experience and directed at the wrong person; it is directly punishing you (the browser user), while only the page authors and site owners have the power to correct the problem. If their entire site is broken, they may get punished indirectly by a traffic volume drop, but if it’s only some pages, well, you get screwed.
The indirect problem is the implications for this. Because the consequences of invalid XHTML are so severe, the W3C was essentially demanding that everyone change how they created web pages so that they only created valid XHTML. In a world where web pages are uncommon and mostly hand written, perhaps this looked viable. In a world where a very large number of web pages are dynamically generated on the fly, it is not. Major sites with dynamically generated pages were never going to rewrite their page generation systems just to produce assured-valid XHTML, when XHTML gave them essentially nothing in practice except a worse user experience for visitors if something ever went wrong. And even by the mid 00s, the web was far more like the latter than the former.
(How well people do even today at creating valid XML can be seen by observing how frequently Atom format syndication feeds in the wild are not fully valid XML. Every feed reader that wants to do a really good job of handling feeds does non-strict parsing.)
If an XHTML web page didn’t validate, the browser showed you only an error message. As reported in other comments, most ‘XHTML’ web pages didn’t (and were saved from this draconian fate only because they were served in a way that caused browsers to interpret them as HTML instead of XHTML).
This is worth elaborating on a bit. People now mostly think of it in terms of obvious errors, like you forgot to close a tag or quote an attribute. But XHTML had some truly nasty hidden failure modes:
Using inline JavaScript (which was common back then)? Well, the content of the script element is declared in the XHTML DTD as PCDATA. Which means you now have to wrap your JavaScript in an explicit CDATA block or else risk well-formedness errors if you use any characters with special meanings. You know, like that < in your for loop.
Oh, and speaking of JavaScript: the XHTML DOM is not the same as the HTML DOM. Methods you’re used to for manipulating the HTML DOM will not work in an XHTML document parsed as XHTML, and vice-versa. But you still have to support processing as HTML because not all browsers can handle XHTML-as-XML. Good luck!
And while we’re on the subject of content types: did you know the Content-Type header can suddenly make your XHTML documents not be well-formed? Turns out, if you serve as text/html or text/xml, and don’t also specify the charset in your Content-Type header, the consumer on the other end is required to parse your document as ASCII. Even if your XML prolog declares UTF-8. Really. So better not have any bytes in your document outside the ASCII range or else you’ll get an error.
And that’s just some of the stuff I still remember a decade and a half later. XHTML was a mess.
XHTML DOM is not the same as the HTML DOM. Methods you’re used to for manipulating the HTML DOM
Personally client side mangling of the DOM was one of those places where the www truly jumped the shark. Then client side mangling and animation of CSS….
In a world where web pages are uncommon and mostly hand written, perhaps this looked viable. In a world where a very large number of web pages are dynamically generated on the fly, it is not.
On the contrary, I would expect that any dynamically-generated site should be able to quite easily generate valid XML, while any sufficiently-complex hand-written XML will likely have at least one validation error.
If it’s really that difficult to generate well-formed XML … maybe we should have just dumped it and stuck with S-expressions?
Correctness matters, particularly with computers which handle people’s sensitive information.
People who hand write web pages do so in small volume, and can reasonably be pushed to run validators after they save or switch to an XHTML-aware editing mode or web page editor. Or at least that is or was the theory, and somewhat the practice of people who did produce valid XHTML.
Software that produces HTML through templates, which is extremely common, must generally be majorly rewritten to restructure its generation process to always produce valid XHTML. At scale, text templating is not compatible with always valid XHTML; the chance for mistakes, both in markup and in character sets, is too great. You need software that simply doesn’t allow invalid XHTML to be created no matter what, and that means a complete change in template systems and template APIs. Even if you can get away without that, you likely need to do major rewrites inside the template engine itself. Major rewrites are not popular, especially when they get you nothing in practice.
I drank the cool-aid and really really tried to do it well. I’ve even used XSLT and XML serializers to generate proper markup. But even when I did everything right, it was undone by proxy servers that “optimized” markup or injected ads (those were dark times for HTTPS). First-party ads didn’t work with the XHTML DOM. Hardly anything worked.
So in the end users were unhappy, stakeholders were unhappy, and I could have used simpler tools.
it was undone by proxy servers that “optimized” markup or injected ads (those were dark times for HTTPS). First-party ads didn’t work with the XHTML DOM.
Well, do be honest you weren’t serving up XHTML then so you can’t blame XHTML for that.
If there was a flaw in the XHTML design was the inability to insert standalone sub-documents. ie. Like the <img src=“foo.png”> and no matter what was inside there, your document rendered, maybe with a “broken image” icon… but your outer document rendered. You needed for what you’re talking about is a <notMyShit src=”…”> tag that would render whatever was at the other end of that url in a hermetically sealed box same as an image. And if the other shit was shit, a borked doc icon would be fine.
The blame that can be placed squarely on XHTML is, I think, that of being an unrealistic match for its ecosystem. Hideous behavior from service providers may have occasionally been part of the picture, but a small one compared to a lot of what’s been brought up in this thread.
It’s clear from your other comments that you view the existence of any type of scriptable interface to an HTML or XHTML document as a mistake, but the simple fact is that it was already a baseline expected feature of the web platform, which consisted of:
Markup language for document authoring (HTML/XHTML)
Style language for document formatting (CSS)
An API for document manipulation (DOM)
Ad networks, and many other things, already made use of the DOM API for the features they needed/wanted.
And then XHTML came along, and when served as XHTML it had a DOM which was different from and incompatible with the HTML DOM, which meant it was difficult and complex to write third-party code which could be dropped into either an HTML document, an XHTML-served-as-HTML document, or an XHTML-served-as-XHTML document.
Ad networks, and many other things, already made use of the DOM API for the features they needed/wanted.
Yup. Ad networks as well as those features were never a thing I have needed or wanted….
API for document manipulation
Never needed or wanted that except as a rubber crutch given to cripple to overcome the fact that html as a standard had completely stalled and stopped advancing on any front anybody wanted.
difficult and complex to write third-party code which could be dropped into either an HTML document
And the 3rd party code was written as a hideous kludge to overcome the stalled html standard.
It’s amazing gobsmacking what they have achieved with say d3.js …. but that is despite the limitations rather than because. If I look at the code for d3 and look at the insane kludges and hacks they do and compared to other better graphics apis… I literally cry for the wasted time and resources.
It was ahead of its time. I think strict validation will maybe be an option in… let’s say 2025 or 2030. We still have a long way to go before people consistently use software that produces HTML the same way as JSON—which is to say, via serialization of a data structure.
I don’t think it’s ever going to happen for HTML, because there’s no benefit. At all.
XML was meant to solve the problem of unspecified and wildly different error handling HTML clients, but HTML5 solved it by specifying how to parse every kind of garbage instead (including what exactly happens on 511th nested <font> tag).
XML parsers were supposed to be simpler than parsers that handle all the garbage precisely, but we’ve paid that cost already. Now it’s actually easier to run HTML5ever parser than to fiddle with DTD catalogs to avoid XML choking on .
We have some validation is some template engines and JSX, but that’s where it should be — in developer tooling, not in browsers.
I’ve yet to hear a good use case for everyone using XHTML.
The format broke Postel’s law when it came to browsers. The iterative, hack-it-until-it-works ethos of HTML built the web. It utilized comparative advantage by making a few browser developers work to accommodate the millions of HTML authors, instead of turning that on its head and forcing millions to write to a draconian and unforgiving standard.
That is completely separate issue from WHATWG/W3C kerfuffle. HTML5 took over despite not doing anything major about DOM either. And now they’re all digging dipper with Web Components, which baffles me.
Most people hailed the introduction of the and tags as positive due to the fact that browser support for these did not require codec plugins of questionable quality and reliability, but was instead built-in to the browser
This has always been the most confusing part of HTML5 to me, as though we needed new tags to get off of plugins when, in fact, you could implement and just fine with plugins even today (though no one does) and, as the article points out, you could implement without a plugin (as at least Firefox does today for supported mime types).
It’s worth looking back at some of the commentary from people who were on the outside looking in at the XHTML2 development process.
David Baron (https://dbaron.org/log/20090707-ex-html):
Jeffrey Zeldman (http://www.zeldman.com/daily/0103b.shtml#skyfall):
Mark Pilgrim (http://web.archive.org/web/20060516121516/http://diveintomark.org/archives/2003/01/13/semantic_obsolescence/):
Speaking personally as someone who was around and actively followed the web standards community in the early-to-mid 00s, I think nowadays we forget just how much trouble W3C’s XHTML really was. It was never particularly friendly to implementations or to document authors, at times was downright hostile to them, and most real-world use of “XHTML” only worked because browsers provided the fallback option of not actually enforcing XML semantics on you if you served XHTML as
text/html
. Take Evan Goer’s infamous “XHTML 100” survey, for example. This involved subjecting web sites of 100 well-known (at the time) web designers/developers, people who presumably had the necessary knowledge and skills to use XHTML, to three tests:application/xhtml+xml
?Of those 100 sites, 88 failed the first test; 18 passed but failed the second test; and only one site passed all three. Choice quote:
One of the major issues with XHTML in practice was its requirement for strict error handling, which it inherited from XML and which was generally observed by the browsers. If an XHTML web page didn’t validate, the browser showed you only an error message. As reported in other comments, most ‘XHTML’ web pages didn’t (and were saved from this draconian fate only because they were served in a way that caused browsers to interpret them as HTML instead of XHTML).
The direct problem with this is that it is both a terrible user experience and directed at the wrong person; it is directly punishing you (the browser user), while only the page authors and site owners have the power to correct the problem. If their entire site is broken, they may get punished indirectly by a traffic volume drop, but if it’s only some pages, well, you get screwed.
The indirect problem is the implications for this. Because the consequences of invalid XHTML are so severe, the W3C was essentially demanding that everyone change how they created web pages so that they only created valid XHTML. In a world where web pages are uncommon and mostly hand written, perhaps this looked viable. In a world where a very large number of web pages are dynamically generated on the fly, it is not. Major sites with dynamically generated pages were never going to rewrite their page generation systems just to produce assured-valid XHTML, when XHTML gave them essentially nothing in practice except a worse user experience for visitors if something ever went wrong. And even by the mid 00s, the web was far more like the latter than the former.
(How well people do even today at creating valid XML can be seen by observing how frequently Atom format syndication feeds in the wild are not fully valid XML. Every feed reader that wants to do a really good job of handling feeds does non-strict parsing.)
This is worth elaborating on a bit. People now mostly think of it in terms of obvious errors, like you forgot to close a tag or quote an attribute. But XHTML had some truly nasty hidden failure modes:
©
for a copyright symbol? You’re now at the mercy of whatever parses your site; a tag-soup HTML parser or a validating XML parser will load and understand the extra named entities in XHTML, but a non-validating XML parser isn’t required to and can error on you (and remember, every error is a fatal error in XML) for using any named entity other than the base five defined in XML itself.script
element is declared in the XHTML DTD asPCDATA
. Which means you now have to wrap your JavaScript in an explicitCDATA
block or else risk well-formedness errors if you use any characters with special meanings. You know, like that<
in yourfor
loop.Content-Type
header can suddenly make your XHTML documents not be well-formed? Turns out, if you serve astext/html
ortext/xml
, and don’t also specify thecharset
in yourContent-Type
header, the consumer on the other end is required to parse your document as ASCII. Even if your XML prolog declares UTF-8. Really. So better not have any bytes in your document outside the ASCII range or else you’ll get an error.And that’s just some of the stuff I still remember a decade and a half later. XHTML was a mess.
Personally client side mangling of the DOM was one of those places where the www truly jumped the shark. Then client side mangling and animation of CSS….
Shudder.
On the contrary, I would expect that any dynamically-generated site should be able to quite easily generate valid XML, while any sufficiently-complex hand-written XML will likely have at least one validation error.
If it’s really that difficult to generate well-formed XML … maybe we should have just dumped it and stuck with S-expressions?
Correctness matters, particularly with computers which handle people’s sensitive information.
People who hand write web pages do so in small volume, and can reasonably be pushed to run validators after they save or switch to an XHTML-aware editing mode or web page editor. Or at least that is or was the theory, and somewhat the practice of people who did produce valid XHTML.
Software that produces HTML through templates, which is extremely common, must generally be majorly rewritten to restructure its generation process to always produce valid XHTML. At scale, text templating is not compatible with always valid XHTML; the chance for mistakes, both in markup and in character sets, is too great. You need software that simply doesn’t allow invalid XHTML to be created no matter what, and that means a complete change in template systems and template APIs. Even if you can get away without that, you likely need to do major rewrites inside the template engine itself. Major rewrites are not popular, especially when they get you nothing in practice.
It does, but the presentation layer is not the right place to enforce it.
I drank the cool-aid and really really tried to do it well. I’ve even used XSLT and XML serializers to generate proper markup. But even when I did everything right, it was undone by proxy servers that “optimized” markup or injected ads (those were dark times for HTTPS). First-party ads didn’t work with the XHTML DOM. Hardly anything worked.
So in the end users were unhappy, stakeholders were unhappy, and I could have used simpler tools.
Well, do be honest you weren’t serving up XHTML then so you can’t blame XHTML for that.
If there was a flaw in the XHTML design was the inability to insert standalone sub-documents. ie. Like the <img src=“foo.png”> and no matter what was inside there, your document rendered, maybe with a “broken image” icon… but your outer document rendered. You needed for what you’re talking about is a <notMyShit src=”…”> tag that would render whatever was at the other end of that url in a hermetically sealed box same as an image. And if the other shit was shit, a borked doc icon would be fine.
You mean an iframe?
Ok, my memory is fading about those dark days…. I see it was available from html4 / xhtml1.0 so basically he had no excuse.
/u/kornel’s problems didn’t arise from xhtml, it arose from his service providers doing hideous things. So don’t blame xhmtl.
The blame that can be placed squarely on XHTML is, I think, that of being an unrealistic match for its ecosystem. Hideous behavior from service providers may have occasionally been part of the picture, but a small one compared to a lot of what’s been brought up in this thread.
It’s clear from your other comments that you view the existence of any type of scriptable interface to an HTML or XHTML document as a mistake, but the simple fact is that it was already a baseline expected feature of the web platform, which consisted of:
Ad networks, and many other things, already made use of the DOM API for the features they needed/wanted.
And then XHTML came along, and when served as XHTML it had a DOM which was different from and incompatible with the HTML DOM, which meant it was difficult and complex to write third-party code which could be dropped into either an HTML document, an XHTML-served-as-HTML document, or an XHTML-served-as-XHTML document.
Yup. Ad networks as well as those features were never a thing I have needed or wanted….
Never needed or wanted that except as a rubber crutch given to cripple to overcome the fact that html as a standard had completely stalled and stopped advancing on any front anybody wanted.
And the 3rd party code was written as a hideous kludge to overcome the stalled html standard.
It’s amazing gobsmacking what they have achieved with say d3.js …. but that is despite the limitations rather than because. If I look at the code for d3 and look at the insane kludges and hacks they do and compared to other better graphics apis… I literally cry for the wasted time and resources.
I was serving
application/xhtml+xml
, but the evil proxies either thought they support it or sniffed content.HTML5 actually added
<iframe srcdoc="">
, but it’s underwhelming due toiframe
’s frameness.It was ahead of its time. I think strict validation will maybe be an option in… let’s say 2025 or 2030. We still have a long way to go before people consistently use software that produces HTML the same way as JSON—which is to say, via serialization of a data structure.
We’re slowly, slowly getting there.
I don’t think it’s ever going to happen for HTML, because there’s no benefit. At all.
XML was meant to solve the problem of unspecified and wildly different error handling HTML clients, but HTML5 solved it by specifying how to parse every kind of garbage instead (including what exactly happens on 511th nested
<font>
tag).XML parsers were supposed to be simpler than parsers that handle all the garbage precisely, but we’ve paid that cost already. Now it’s actually easier to run HTML5ever parser than to fiddle with DTD catalogs to avoid XML choking on
.We have some validation is some template engines and JSX, but that’s where it should be — in developer tooling, not in browsers.
HTML is a semantic document markup language that has now become a retained mode graphical user interface toolkit.
We just need to port GTK+ to wasm and have a “document web” and an “app web”…
GTK already has an official web backend. It’s called Broadway. It may even work for some apps you already have installed. Alexander Larsson has done some work improving it in GTK 4
I’ve yet to hear a good use case for everyone using XHTML.
The format broke Postel’s law when it came to browsers. The iterative, hack-it-until-it-works ethos of HTML built the web. It utilized comparative advantage by making a few browser developers work to accommodate the millions of HTML authors, instead of turning that on its head and forcing millions to write to a draconian and unforgiving standard.
W3C screwed it up so badly that for half of Web’s lifetime they did not control HTML/DOM standards.
No, they screwed up by not providing better means to get the job done than by querying and mangling the DOM.
Querying and Mangling the DOM clients side is a hack upon a kludge created by someone with more time than sense.
That is completely separate issue from WHATWG/W3C kerfuffle. HTML5 took over despite not doing anything major about DOM either. And now they’re all digging dipper with Web Components, which baffles me.
I there there were a few of us :)
This has always been the most confusing part of HTML5 to me, as though we needed new tags to get off of plugins when, in fact, you could implement and just fine with plugins even today (though no one does) and, as the article points out, you could implement without a plugin (as at least Firefox does today for supported mime types).