1. 73
  1.  

  2. 22

    I used wget to download all 1,217 of the W3C specifications which have been published [..] The total word count of the W3C specification catalogue is 114 million words at the time of writing.

    This calculation is wrong. For example searching for HTML reveals different versions of the same document, various informative notes (“HTML5 Differences from HTML4”, “HTML/XML Task Force Report”), things no one uses like XForms, and other documents that really shouldn’t be counted.

    I looked at the full URL list and it includes things like the HTML 3.2 specification from 1997. A quick spot-check reveals many URLs that shouldn’t be counted.

    I’m reminded by the time in high school when I mixed up some numbers in some calculation and ended up with a door bell using 10A of power. The teacher, quite rightfully, berated me for blindly trusting the result of my calculations without looking at the result and judging if it’s vaguely in the right ballpark. This is the same; 114 millions words is a ridiculously large result, and the author should have known this, and investigated further to ensure this number is correct (it’s not) before writing about it.

    I wouldn’t be surprised if the actual word count is two orders of a magnitude smaller; perhaps more.

    1. 8

      I wrote a rebuttal to this comment when it was cross-posted to HN:

      Those specifications from 1997 are still relevant. That’s why we end up with things like quirks mode:

      https://quirks.spec.whatwg.org/

      And on the subject of WHATWG, all of them were excluded from the word count. And, the JavaScript spec, and nearly all of the JavaScript APIs browsers are implementing. Things omitted include WebGL, Web Bluetooth and Web USB, the native filesystem API, WebXR, Speech APIs… and, the informative notes you mentioned are (1) a rounding error when compared to the specs, and (2) are also included in the word counts for POSIX, C11, and so on.

      And the word count I gave in the article is half of the real count I ended up with, and I didn’t even finish downloading all of the specs to consider.

      My full write-up on the methodology is here:

      https://paste.sr.ht/~sircmpwn/13c1951014a256e9f551296a129bf6d10e9303dc

      Anyone who thinks that the web isn’t hundreds or thousands of times more complicated than almost anything else out there is lying to themselves.

      1. 14

        Quirks mode is its own document, and from memory HTML5 also includes a section of it. No one writing a browser now needs to look at the HTML 3.2 doc, or HTML 4, 4.01, CSS1, etc. etc. etc. Never mind all the “notes” that are essentially just weblog posts and not part

        That you omitted some specifications (much of it is rarely used, nly used for stuff like Electron) doesn’t mean your calculations are correct. You can’t just say “oh, I didn’t bother counting some stuff so who cares about some bogus data”. That’s not how it works.

        Anyone who thinks that the web isn’t hundreds or thousands of times more complicated than almost anything else out there is lying to themselves.

        That’s not the point. The point is that the data you used is wrong. Wrong data is wrong, whether the underlying point is correct is another matter.

        1. 6

          That’s not the point. The point is that the data you used is wrong. Wrong data is wrong, even if the underlying point is correct.

          It’s an approximation, and by my reckoning a pretty fair one. And, more to the point, if you agree that the conclusions are correct even if the data isn’t, then what the heck are you complaining about? The margin of error could be huge and all of the points would still be valid, with room to spare, because the web is so complicated that it outclasses everything by an obscene margin.

          1. 16

            And, more to the point, if you agree that the conclusions are correct even if the data isn’t, then what the heck are you complaining about?

            This concept is called confirmation bias and is a wildly insidious way of thinking.

            It’s the equivalent of saying, “Ignore the data and process because you agree with the result”.

            No thank you. @arp242 is making a reasoned point about the import of good data and process, and you would do well to suppress your arrogance and ruminate on it.

            1. 20

              Including 12 different versions of the same document is not an “approximation”, it’s just wildly wrong.

              I never said I agreed with your point: I don’t. I’d argue the web isn’t significantly more complex than, say, the Java or Python ecosystems (JVM/CPython, common libraries, etc.) We get a lot in return, as well, such as the ability to file my taxes on my OpenBSD machine, instead of having to muck about with some Windows tool like before.

              I just pointed out, or “complained”, about some (IMHO pretty obvious) concerns about the quality of the analysis. Just a conclusion without sound argumentation is rather useless in most cases; I actually wrote a thing about that a few years ago.

              1. 8

                the ability to file my taxes on my OpenBSD machine, instead of having to muck about with some Windows tool like before

                Oh god yes! I certainly agree that the web is needlessly complex, but I also have to admit that it’s obsoleted a lot of crappy and untrustworthy desktop software. Most web sites/applications are also untrustworthy but at least they’re sandboxed and can access only what little data you put in (for the most part).

        2. 1

          You are free to do the same calculations yourself, yet two notes I believe must be considered

          And to put any remaining doubts to rest, I took about 100 million words off of the number I gave in my article. The real sum I ended up with is over 200 million.

          And I didn’t even let wget finish downloading all of the specs.

          I believe if you discard all of the informative notes, and old specifications, your number will be close.

        3. 13

          There’s definitely a “market” (1) for a web browser that just renders HTML only and no JavaScript. I’m using Firefox (with uMatrix / uBO) in this way to browse news sites without a problem. On Android, I’m using PrivacyBrowser from F-Droid for this purpose. Unfortunately NetSurf is just not daily driver material yet but I would like to switch to it eventually.

          Essentially you’re using the web in a “read-only” sort of way since most interaction requires JavaScript. Obviously it doesn’t work out for all sites so I keep a separate browser around for when I need a richer experience with JavaScript.

          In fact in some ways such a minimal “read-only” web browser offers a better experience on news articles from the latimes / nytimes website. It also works perfectly on this site, HN, old.reddit, and for viewing tweets (Twitter has a non-js version of their site).

          1. When I say “market” I may only being describing a market with a market size of 1. 😀
          1. 9

            Essentially you’re using the web in a “read-only” sort of way since most interaction requires JavaScript. Obviously it doesn’t work out for all sites so I keep a separate browser around for when I need a richer experience with JavaScript.

            Author here. SourceHut actually works pretty well with Netsurf, both read and write :)

            I hope more of the web becomes compatible with more conservative browser implementations. I think that’s the only way out of this mess.

            1. 4

              SourceHut actually works pretty well with Netsurf, both read and write

              Even when the site you’re using functions with the features Netsurf offers, it still doesn’t provide anything beyond the most trivial of keyboard shortcuts. Not only does it have a long way to go on catching up to being a functional browser, it is far behind on the much more easily-attainable table stakes of being a reasonable desktop application.

            2. 9

              The major problem I encounter blocking JS is that a huge percentage of “read-only” sites simply don’t render at all, which is lazy development on their part but it’s hard to fault not catering to such a small slice. And with accessibility APIs replacing DOM parsing for screen readers, that slice is containing fewer and fewer users with a11y needs, leaving only us ideologically anti-JS folk.

              1. 7

                Make that a market of 2! When running nested VMs for my Advanced Operating Systems class projects, I desperately wanted a browser that had basically 0 functionality other than just rendering the HTML. With only 8GB of RAM, my computer was stretched pretty thin between a VM with 8 nested VMs under it and an instance of Chrome with my assignment resources. I’ve often thought about building this type of minimalist browser once I’ve graduated and have more time on my hands for an extended project. In memory-tight situations like the one I was in, it could be pretty handy!

                1. 22

                  once I’ve graduated and have more time on my hands

                  I hate to be the one to have to tell you this, but that’s really not how life works ;)

                  1. 6

                    Life works the way you choose for it to work. There are a lot of possible life-paths that would result in you having less free time to work on personal programming projects compared to an ordinary American college lifestyle; but there are also lifestyle choices you could make that would result in you having more free time for this. In the extreme, you could decide to live as the coding equivalent of a starving artist, working just enough at irregular jobs to have the money to keep yourself barely fed and clothed and buy a laptop with, so you can focus on your coding. I’m not necessarily advocating this kind of lifestyle, but it’s important to be aware that if doing something is important enough to you, you have the power to restructure your life around doing that thing, even if it goes against social norms.

                  2. 3

                    Have you ever tried Dillo?

                    1. 2

                      I hadn’t! I just built and ran it and was pretty impressed with what they’ve done so far. It’s definitely not a finished product (or maybe it’s just finicky on MacOS), but it would be fun to contribute to and mess around with.

                  3. 4

                    You can also use Firefox+uBO+uMatrix on Android! It really makes a huge difference on battery life for me.

                    1. 2

                      µMatrix works on mobile?

                      1. 3

                        Yes, the new WebExtensions-style addons work on Firefox for Android. Unfortunately addons don’t work on Firefox for iOS (which is basically a reskinned Safari, not the Gecko engine).

                        The uMatrix UI shows up in a new tab rather than a popup, but it looks and acts the same as on desktop.

                      2. 1

                        For now…Mozilla will replace the current android version of Firefox with fennec (Firefox preview) which currently only supports uBO. I don’t know what the timeline for full webextension support is, but I believe there will be a period without addons :/

                    2. 12

                      The same can be said about operating systems, which web browsers are essentially becoming (or have been). Casey Muratori calls it the thirty-million-line problem.

                      1. 13

                        I think this is an often overlooked but crucial fact of the modern Web. The Web started off as a document browsing system, a way to read text content stored on a network. It has quickly become a distributed system with browsers being the client OSs. Browsers provide memory management, hardware management, isolation between processes, and generally expose lower-level capabilities and resources through an API. All of these tasks are things an OS does.

                        I think that the Web is a classic case of computers moving too fast. While I’m glad that the modern Web has changed the world in the ways it has, I think the points the author makes are an indication that we often run into the limitations of trying to build a distributed operating system on top of a document sharing system.

                      2. 12

                        Which future should we take guys?

                        1. Create a de-facto HTML only browser
                        2. Adopt Gopher
                        3. Create a new way to discover content

                        No time is better than now to start doing this.

                        1. 5

                          The thing is that webapps are useful and convenient. Maybe we need to separate webapps from web documents, but for web documents HTML is still the best format! (EPUB is HTML also) Hyperlinks are a great idea! Gopher is inferior in every way to WWW and it’s popular in some circles because it’s a dead protocol and short to implement (but not as easy or clean as WWW 1.0 in my opinion).

                          1. 1

                            Having written both a gopher server and client, not being as clean as HTTP/1.0 I can see, but harder than HTTP/1.0? No, the gopher protocol is way easier to implement and because of that, it’s very limited in what it can do.

                          2. 7

                            Create a de-facto HTML only browser

                            I’d be tempted to add “contribute to existing efforts like netsurf and dillo so they can be more polished and usable”, but given that they’re both implemented in C I think there’s something to be said for avoiding building on a foundation of sand.

                            1. 3

                              I think there’s something to be said for avoiding building on a foundation of sand.

                              And this is why the wonderfully smart people on this page (among whom I include myself) will never overtake the browser vendors. It is much easier to succeed at business if you have fewer rules, like they do.

                              1. 2

                                Didn’t know about NetSurf so thanks for mentioning it. Going over tho their webpage I see the following in the “Why choose NetSurf?” section:

                                Despite a myriad of standards to support, NetSurf makes surfing the web enjoyable and stress-free by striving for complete standards compliancy. As an actively developed project, NetSurf aims to stay abreast of new and upcoming web technologies.

                                Isn’t this exactly the opposite of what is being said here?

                                1. 4

                                  Yes, that statement is not even close to being true. A charitable reading would say it’s … aspirational.

                                2. 2

                                  There’s absolutely nothing wrong with software implemented in C.

                                  1. 8

                                    That person said web browsers: highly-complex software that processes malicious data on a regular basis. Writing such things in C usually guarantees preventable vulnerabilities, esp code injection. Folks that don’t want them advise against using C.

                                    The performance of modern hardware, the need to just do HTML/CSS, and modern languages should make it easier to avoid C.

                                    1. 2

                                      Writing such things in simple, modern C is less likely to cause preventable vulnerabilities than using existing mammoth C++ codebases full of super legacy code.

                                      1. 2

                                        Probably true. I’d argue against using C++, too. Strange reply, though, since my comment was pushing safe alternatives to C. That would mean languages such as Ada or Rust, not C++.

                                        1. 1

                                          The existing browsers are all written in C++. They’re what actually exists and thus the only reasonable comparison to make. Nobody has demonstrated it’s possible to write software like this in Rust.

                                          1. 4

                                            Other than what @agent281 wrote in the sibling comment, AFAIU Rust was in fact basically borne out of one guy’s frustration with having to write a browser in C++… thus if anything, Rust’s primary goal and focus is for it to be useful in writing browsers, not to mention that for long time it is/was financed by a company mostly known for writing a browser. Also, uh, isn’t Servo a living and kicking demonstration of “software like this”?

                                            1. 2

                                              I think the fact that Rust was explicitly designed for writing web browsers but there are no web browsers written in Rust is actually evidence that Rust is not a good language for writing web browsers in.

                                              Rust has now been around for more than 10 years, and for five years post-1.0, and still there aren’t any pure-Rust web browsers. Google made Go for writing web services, do you think 10 years after its creation they still had no web services written in Go? It doesn’t take that long or that much effort to actually produce software.

                                              Servo, last time I looked, doesn’t actually work yet. It isn’t a web browser, for one thing, but it also doesn’t implement even enough of a layout engine to lay out web pages from the 90s. It was a while ago that I checked, but not that long ago.

                                              1. 2

                                                Hm; agree to disagree, then, I guess… :) I seem to remember reports of Servo correctly rendering Reddit :P but I know my memory is sometimes flaky… shrug I don’t care that much…

                                            2. 1

                                              If Drew is to be believed then no one will.

                                3. 18

                                  Web browsers long ago left the job of rendering hypertext behind in favor of becoming a virtual machine for cross-platform, cross-form-factor applications. It’s obvious that some people dislike that. If we were to do it all over again and create a program that was designed from the start to run applications instead of display linked text pages, it probably would be far less complex and work better (or maybe it would have turned out like Java GUIs, who knows?).

                                  I think we can agree on that, but until everyone realizes that browsers aren’t just used for displaying text with a few images anymore, we’ll keep getting into ideological arguments like this about complexity. Sure, web browsers are too complex for what they do, but they also let you write code that can run consistently on nearly every device out there, from an Apple Watch to a huge server.

                                  That’s a really hard thing to do with a huge amount of resources, let alone with a few people hacking something together themselves. I don’t think that it’s too big of a problem that you need a lot of resources to be able to implement something like a web browser. I’ll admit that some of the new web features are silly, but since they’re not used for much more than demos, you really shouldn’t worry about implementing them. The real challenge of making a new web browser is creating a performant alternative to v8 and hardware-accelerated rendering. Word counts have no concept of importance; getting a web browser that runs 99.9% of websites out there is far easier than implementing all the W3C specs.

                                  1. 2

                                    getting a web browser that runs 99.9% of websites out there is far easier than implementing all the W3C specs.

                                    Unfortunately, many of those websites rely on implementation details (i.e., Blink) to run properly. It’s C’s claims of “portability” all over again: the tool can only stay true to its selling points to the extent the builder allows it to.

                                  2. 19

                                    Firefox is filling up with ads, tracking, and mandatory plugins.

                                    This is a disingenuous exaggeration. Mozilla is held to a different standard than everyone else, so whenever they make a slightest mistake (like a branded extension in their pilot program), or tiniest deviation from an absolutist position (negotiated the least invasive, default-off, explicit-opt-in sandboxed DRM plug-in that allows them to open Netflix), they’re immediately treated the same as other vendors who never even tried, or even actively and openly worked on the opposite (e.g. Google who created and heavily lobbied for the browser DRM).

                                    1. 11

                                      I agree. The web platform is terrifyingly complex, and the sheer centralisation of power in the hands of the Chrome and Firefox teams should be a concern to every user, every company with a website, and every government (perhaps not the most pressing concern, but it should be on the list somewhere).

                                      However, describing Firefox as ‘free to stop being the “user agent” and start being the agents of their creators instead’ is unfair. The whole Free Software/open standards worldview is based on the idea that cooperation and communication produce the most resilient and sustainable ecosystem in the long-term, even though they add a lot of overhead in the short term.

                                      So what do you do when somebody comes along who’s willing to sell out the long-term so they can provide bread and circuses in the short term?

                                      • You can stick to your guns, keep doing things slowly, and give up on any influence or relevance. This is the Elinks/Dillo/NetSurf path.
                                      • You can abandon your principles, and join in the gold rush. This is the Edge/Vivaldi path.
                                      • You can get tactical - some of your principles will get squeezed, but if you can restrain the beast or at least steer it a little, maybe it’ll be worth it in the long run? This is the Firefox path.

                                      It seems like a lot of people on forums like this one expect a fourth option, along the lines of “stick to my guns, everybody comes to their senses and realises I was right all along, all websites are legally required to be designed for Netscape 3”. While that would be lovely, and would certainly address the “reckless, infinite scope” issue, realistically that’s never going to happen and it’s not fair to blame Mozilla for being unable to bring it about.

                                      1. 15

                                        Ads:

                                        https://www.ghacks.net/2018/12/31/firefox-with-ads-on-new-tab-page/

                                        https://www.zdnet.com/article/firefox-60-will-show-sponsored-stories-but-you-can-disable-them-says-mozilla/

                                        Tracking:

                                        https://gist.github.com/0XDE57/fbd302cef7693e62c769

                                        https://www.zdnet.com/article/firefox-tests-cliqz-engine-which-slurps-user-browsing-data/ (ads, too)

                                        Mandatory plugins:

                                        https://news.ycombinator.com/item?id=9667809

                                        There are more cases of each, but these are the ones I thought of off-hand. Setting up Firefox today still requires you to manually go to about:config and turn off a whole bunch of crap. A stock install of Firefox has ads and sends telemetry, searches, and more to both third- and first-party network services.

                                        1. 7

                                          These are totally bullshit things blown out of proportion. These are trials that never went live, and/or weren’t even nearly as bad as the uproar make them to be.

                                          Come on, the Pocket hysteria? It’s a bit of JS and a one button you can turn off with two clicks. Pocket is now owned by Mozilla. It’s a Firefox feature now, and not any more of “mandatory plugin” than the Sync or Add On Store are. I thought you were at least flipping out about EME, which is a 3rd party code and actually a plug-in.

                                          Your central point is valid. There’s no need to embellish it with clickbait backed by sources that are clickbait themselves.

                                          1. 13

                                            All of these things went live to end-users, without informed consent. Some of this is still live today. I’ll kindly ask you to quit the bullshit.

                                            1. 3

                                              These are trials that never went live

                                              Ads have been on the mobile firefox web browser for over 3 years. There is no way to remove them.

                                        2. 7

                                          The article is missing the nail a bit. W3C encompasses a lot more than technologies intended for web browser, but web services as well.

                                          You have the entire department of semantic web technologies (sparql and jsonld to scratch the surface), along with the recent inclusion of decentralized technologies as well, activitypub and activitystreams, which I assure you is never going to be something a browser has to implement. And I’m confident there are plenty more examples in there which are not relevant for web browsers at all.

                                          Yes, web browsers are complicated. But counting words in W3C specifications is a terrible measure.

                                          1. 7

                                            One thing that I find beautiful (though maybe it’s not good) it’s that the web is very much nature-like, like nature evolution like it’s a living being. Their mess is like the real world, look at the thousands of different species of spiders in real world. An intellligent design would have never created so many species, yet so difficult to differentiate. It has more in common with natural languages with all its exceptions and shorthands. For math-people this is a disaster. But I find it beautiful somehow.

                                            1. 3

                                              I don’t see what’s beautiful about the average news website requiring megabytes of tracking javascript and shitty ads to be downloaded before I can read simple news articles.

                                              1. 2

                                                I like how you appreciate the complexity of the web. In any case, spiders are a natural resource: producing materials (e.g. silk), engineering inspirations (exo-skelotans), medicine (venom), food (e.g. birds), and bug population control and so forth, in a wide range of environments. We don’t fully understand. We can barely make a web browser. Some my find it odd there are millions of different restaurants. Or software companies. Or Unix versions. But I agree, the Chrome Browser randomly evolved from a bash script without intelligent intevention. And Unix formed by chance after an explosion in Bell Labs. Joking aside, it is beautiful.

                                                1. 1

                                                  And another thing they both have in common: humanity’s involvement in their development will be their downfall.

                                                2. 6

                                                  This post… probably doesn’t really mean anything. Firstly, judging a web browser’s compelxity by judging the spec catalogue is already unfair. The catalogue is basically a dump of things related to the web, for example the specs of JSON-LD. (Which probably has almost no relationship to implementing web browsers, since that’s just a data format of JSON.)

                                                  Also, word count doesn’t correlate with complexity. That’s like…. saying that a movie will be more entertaining than another one because it’s runtime is longer. Web-related specs are much, much more detailed than POSIX-specs because of their cross-platform nature: we’ve already seen what happens if web-related specs look like POSIX: anybody remember trying to make web pages that work both in IE, Safari, Firefox in the early 2000s? (Or, just try to make a shell script that works on both macOS, FreeBSD, Ubuntu, and Fedora without trying out on all four OSes. Can you make one with confidence?)

                                                  Really, it’s just tiring to hear the complaints about web browsers, especially the ones about ‘It wasn’t like it in the 90s, why is every site bloated and complex? Do we really need SPAs?’ Things are there for a reason, and while I agree that not every web-API is useful (and some are harmful), one should not dismiss everything as ‘bloat’ or ‘useless complexity’.

                                                  1. 8

                                                    Man, there’s a lot of copying and pasting comments from HN going on in this thread. Just to save the effort of copying and pasting the rest of this thread…

                                                    https://news.ycombinator.com/item?id=22617536

                                                    1. 4

                                                      Please do not copy and paste your comments from HN to here (or vice versa). HN and Lobsters have different cultures, different purposes and different users but there still is significant overlap in users. It annoys everyone, and I imagine especially the author, to have to read the same comment twice.

                                                      Also, word count doesn’t correlate with complexity. That’s like…. saying that a movie will be more entertaining than another one because it’s runtime is longer.

                                                      No, it really isn’t. It’s like saying that a movie will be longer than another one because its script is longer. Maybe there are some movies with really detailed screen directions in the scripts that mean they have a longer script than a longer, less-detailed movie does but it’s still a really strong correlation.

                                                      I think the same is true here. Word count in a specification really is highly correlated with complexity. You need more words to describe more complex behaviour.

                                                      Web-related specs are much, much more detailed than POSIX-specs because of their cross-platform nature: we’ve already seen what happens if web-related specs look like POSIX: anybody remember trying to make web pages that work both in IE, Safari, Firefox in the early 2000s?

                                                      The problem was not the specifications but the implementations of those specifications. Today, people aim to implement things in a compatible way, back then they aimed to implement them in an incompatible way.

                                                      Really, it’s just tiring to hear the complaints about web browsers, especially the ones about ‘It wasn’t like it in the 90s, why is every site bloated and complex? Do we really need SPAs?’

                                                      It’s way more tiring to have to use bloated, overly complex websites that waste my mobile data limit.

                                                      Things are there for a reason, and while I agree that not every web-API is useful (and some are harmful), one should not dismiss everything as ‘bloat’ or ‘useless complexity’.

                                                      “Things are there for a reason” is the most non-answer answer I’ve ever seen, holy crap. Yeah, they’re there for a reason: a fucking bad one.

                                                    2. [Comment removed by author]

                                                      1. 4

                                                        I agree, this is very tinfoil hat.