As an even more extreme example of this, look at the “urls” that are allowed in httpie compared to curl. I think these shorthands make perfect sense in interactive developer tools and are a terrible idea in libraries. (I’m disappointed curl can’t be more lax about urls but also I get Daniel’s position.)
This is disappointing. I did not know there were competing standards for URLs now.
The WHATWG standard isn’t much of a standard - the document changes right from under you as it’s a “living standard”. AFAIK there are no stable URLs of particular versions. And there’s no BNF anymore, either, as they ripped it out. I complained about that before but they seem uninterested in fixing it, which also gives me little confidence that if someone added it as a contribution, it would be maintained.
It only changes under you if you don’t pay attention ;-)
I guess it’s good that I just always go with the IETF standard then. I wonder how the WHATWG has diverged though? What things can break if I send a valid IETF standard URL to a browser?
One example is WHATWG URLs allow \ as a path seperator, and IETF URLs only allow /. This means that if you use an IETF URL parser to get the hostname of https://evil.com\@good.com/ it will parse as good.com, but with a browser’s url parser you get evil.com.
I think the WHATWG standard is laxer than the IETF standard, as it allows for “auto-correcting” URLs with problems (search for “spaceAsPlus” in the text, for example - that allows for optionally converting spaces in the URL to plus signs, depending on a parser mode flag). I guess this means a browser will (should?) accept IETF URLs, but there will be URLs accepted by browsers which won’t be considered URLs by other parsers.
But of course, without a proper BNF, who knows exactly what’s different? There might be some edge cases where an IETF URL would be rejected by a browser. This verbose step-wise description is less formal and more wordy, which makes it more difficult to figure out what grammar is being described.
I am really disappointed in Firefox for having helped Google do this. URLs are the foundations of modern web, and if there is no stable definition (as you said, there is no formal grammar definition any more) then what exactly are we building on?
These replies are especially sad. If one requires Turing complete steps, what can you guarantee about how a given URL is processed?
Despite those comments I would be extremely surprised if the url format was actually Turing complete. There’s an obvious preference to express the parsing algorithm using a Turing complete language, but that doesn’t mean that the url grammar is actually Turing complete
Why are you disappointed? Mozilla and Google folk were trying to ensure that the standard matched reality.
The WHATWG wants to claim they own everything, but then ignore all the interesting cases since they only need things to work in Chrome…
That’s incorrect. Changes in a whatwg spec require at least two implementations (e.g., one of WebKit, Blink, Gecko) to agree for any changes.
That’s barely a difference, since it’s still just a list of large web browser engines. And in practise if Google wants something do you really think Mozilla and Apple could block? Chrome would implement, force the issue in the name of compatibility 🤷♂️
And in practise if Google wants something do you really think Mozilla and Apple could block?
And in practise if Google wants something do you really think Mozilla and Apple could block?
Considering there have already been multiple things that Google has implemented but Apple and Mozilla have refused to (citing security/privacy concerns), I think that it is in fact possible Google might implement something and Apple and Mozilla would refuse to.
What other browsers should matter? There aren’t any that have sufficiently complete implementations to provide meaningful info regarding implementation issues, and there aren’t any others that have sufficient users to be relevant.
For URLs/URIs especially mostly things that are not browsers or web related at all. cURL sure, but also XMPP clients, Telecom equipment, git, lots of stuff that relies on this syntax!
They don’t rely on the url format that browsers have to support
In fairness, it’s worth asking if that is because the IETF was not willing to consider the needs of the browser and/or moved to slow or just because they are bad actors?
The IETF barely exists as an “entity”, it’s made up of the people who show up to do the work. I don’t think the WHATWG are “bad actors” but rather I think they’re only interested in documenting whatever Google was going to do anyway rather than improving anything.
Don’t get me wrong, having public documentation on what Google was going to do anyway is great! Way better than reverse engineering their nonstandard stuff. But what’s dangerous is when non-Google people treat it as a standard and implement it instead of the actual standard.
I think this misrepresents and underplays both the historical purpose and current value of WHATWG. When half of the web is violating e.g. the html standard in one way or another, and it all just so happens to work because of browser bugs or misfeatures from IE6, it is very useful to have a spec that fully describes error recovery. It is not useful to have a standard with ebnf that describes an ideal world but doesn’t really tell you how to write a real parser that actually works on 99% of websites.
I’m not aware of any inherent reason those specs couldn’t live within the IETF body, but I know of and have experienced the editorial process of the IETF. It is absolutely not frictionless and I can imagine the writing style of WHATWG would not pass their editorial guidelines.
I think you should take a look in WHATWG’s public issue tracker before asserting that IETF is fundamentally a more open venue than the WHATWG. I feel like half of the assertions you make are coming from your public perception of what the Chromium team is doing, not from actual problems observed with the spec body.
When it comes to something that is web specific, I agree. And in general, as I said “having public documentation on what Google was going to do anyway is great!”
But for things that are used far beyond the web and in all sorts of contexts, like URLs/URIs, having web browsers write their own incompatible “spec” for how these should work is… strange at best. And again, having something written to document what they do is still useful. But it’s not a standard, and it should not be implemented by anyone not trying to be compatible with browser behaviour for a specific reason.
“whatever Google is going to do” is exactly the misrepresentation I am talking about. This is not how it started and this is not how it works today. The fork did not just happen due to ideological resentment of existing spec bodies and corporate greed, the fork happened in practice much sooner in browsers like IE6. And the earliest whatwg spec was written at Opera.
I can apprechiate your other points and the complaint in the article. The html spec in particular is written with a very particular audience (browser implementors) in mind.
It’s great that Opera started documenting the insanity that is browser URL parsing.
So your perspective is, if Google wanted google!/blahblah as a valid URL, implemented it in Chrome and pushed it to production, WHATWG wouldn’t accept it?
Note: I’m not suggesting Google would ever want such a thing, and I just randomly made up URL nonsense on purpose :)
My guess is, and I imagine @singpolyma’s perspective also, WHATWG would accept the change, perhaps with some token amount of GRRR.. but it would happen.
I’m not going to speculate as to what might happen in such a scenario. Can you point out a comparable scenario where Google did something like this and the standardization process was as you described? Otherwise we’re talking about projected fears, not reality.
What I have seen is that Google does indeed push forward with feature impls, but the standardization process in WHATWG is not as lackluster you describe.
It’s happened in other aspects of web browsing, but I’m not currently aware of it in URL parsing, specifically.
I can recall a few cases of the opposite: Google went on to implement something, others refused, so it was not standard, and google rolled back its implementation.
What needed to change in the URL spec that you think the IETF was so slow about?
I don’t knows the history here, thus my question. I’ve always used ietf compliant parsers and was surprised there were two competing standards. I was merely musing on the potential reasons for another standard having been started.
Ah, it sounded like you were making a claim about the IETF, not asking a question. Got it.
The WHATWG predates chrome by many many years, and is part of the reason chrome was possible. Having an actual accurate specification meant webkit was able to resolve a number of compatibility problems, so when chrome came out it was compatible with the majority of the web.
Besides the deficiencies of WHATWG, you can always have bugs in any one parser, which is another reason not to mix parsers.
The core problem is that the IETF specification does not reflect the reality of what browsers have to handle. It doesn’t matter how much you might say “this is an invalid url”: If it works in some browsers but not others, users will point to those browsers as evidence that your browser is wrong.
The same happened with the W3C. Instead of attempting to define a standard that actually matched reality, they went off into the xhtml weeds, and never tried to address the issues that actually mattered: the actual content browsers had to support. That is the primary reason the WHATWG ended up being responsible for the actual html standard.
It does not matter how much you may want to be the sole definition, if that spec fails to match actual content it is irrelevant.
<!-- This starts a multiline HTML and is a single line JS comment
alert("this is JS, still in the HTML comment tho")
// time to end the HTML comment! -->
Oh I know. I worked on the portions of the spec that standardized it :D
I was never entirely aware of the why of it existing, it honestly seemed to line up more (at the period I was dealing with it) with “xhtml” that was being served as html. Sidenote: more or less every single “xhtml” document I ever encountered was not treated as xhtml due to various response errors, and more or less none of them would actually validate as xml :D
Yet another reason to not use URLs when you can avoid them. In many cases, it’s better to use a simple key=value format. Or even JSON. It’s better than trying to fit some random data structure into a URL protocol, host, path and query string, and fragments.
Agreed, but JSON has its own problems. IIRC there have been analogous security problems caused by using multiple JSON parsers. (There’s an Apple one I’m thinking of, where the kernel used a different parser than userland, but I’m not 100% sure it was JSON.)
That was PsychicPaper you’re thinking of, and it was an issue in a PLIST XML parser.
This is another reason why the mantra Parse, don’t validate is so important.