1. 40
  1. 11

    This isn’t what I’d think of as an outage. Services have outages. This was just a bug that they happened to be able to mitigate the effect of by changing how an external service behaved.

    1. 8

      Story from during the outage

      1. 7

        As part of the incident response process, we quickly discovered that the client was hanging inside a network request to one of the Firefox internal services.

        Maybe someone can explain to me why I use a browser on my desktop to visit a third party, yet some of my traffic goes through Mozilla, and worse, it’s blocking. Is this true of Chrome/Google and Edge/Microsoft as well? Why does the web, a distributed system, have such a single point of failure?

        1. 18

          First of all, your traffic to a website isn’t going through Firefox internal services. Firefox needs to talk to a variety of services to function. As it is explained in the blog post. Secondly, and this is also explained in the blog post, Network connections are complicated. If a thread keeps reading (infinite loop), because it got the Content-Length header wrong, then you’ll have a bad time.

          1. 10

            Why does Firefox need to talk to a variety of services to function? It is a web browser. The only thing it should be talking to is the web site I’m visiting.

              1. 10

                While I appreciate the article, I’m questioning three things.

                1. Why are these features required for Firefox to function? None of these are critical to the point of being a web browser.

                2. Why are these enabled by default? Again, critical functionality is functionality that you need to remain functional. A web browser doesn’t need a majority of this to remain functional.

                3. Why were these added in the first place? As a very long time Firefox user, a majority of these weren’t present when the browser launched or hit its popularity spike. It functioned perfectly fine and was a refreshing change from IE.

                I don’t have reasonable answers for any of these, and I suspect they don’t exist.

                1. 33

                  i hear that last sentence as: “I don’t work on this project, and yet I believe I understand it just as well as people who do; well enough to lecture them on it.”

                  I would say that, at a minimum, the security features are required for a browser to function. You can’t securely trust X.509 certificates without a means of knowing whether a cert has been revoked. And checking for updates is important for mitigating security problems soon after they’re discovered and fixed.

                  1. 7

                    I wouldn’t perceive a list of questions as a lecture, just that I don’t have answers as to why “Mozilla Content” and “Diagnostics” (more accurately, active telemetry), are required, enabled by default, and were added in the first place.

                    I can make some arguments for diagnostics (crash reports are helpful, for instance, provided they’re actually acted on). I can also agree that update checks are good. That does not qualify as a “majority of features”, and these features don’t exactly speak to the message of “Firefox needs to talk to a variety of services to function” if they can be disabled and the browser still functions fine.

                    If only the people who work on this project can ask questions about it, or disagree as to what’s required for basic functionality, then the project isn’t meant for external use. Firefox is clearly meant for external use. Shutting down conversation under the guise of “you don’t work on it, therefore you can’t comment on it” doesn’t do anybody any good, developers or users.

                    1. 21

                      Your conversation is getting shut down because your questions are thinly-veiled statements: these features shouldn’t be required to function, these features shouldn’t be enabled by default, these features shouldn’t have been added in the first place. The fact that you’re framing these statements as questions (“should they be required/enabled by default/have been added?”) doesn’t really change things, since you’re literally following up with “I have no idea, so I suspect not”.

                      1. 4

                        I’m following up with “I don’t know, I suspect not, but I’m open to being proven wrong”.

                        Asking questions and then providing your current opinions should not be framed as stating anything, but I can understand why it would be perceived that way.

                        1. 21

                          We’ve had so many discussions about that. And in the end it all boils down to

                          • yes mozilla needs auto update, that’s correct for 99% of the users, your distro may change that
                          • yes mozilla may want to get user statistics, it can help a ton with decisions for hardware support and features
                          • yes fetching a blacklist of known malicious websites and revoked certificates as well as the current trusted root certs is very reasonable and also valid for 99% of the userbase
                          • yes you’re already trusting mozilla, whether it’s an auto update, a manual one or a first installation
                          • yes you can ask again and again why that’s the case and tell us that this shouldn’t be necessary, instead of simply switching browsers / disabling these options when asked by firefox / disabling it via usersettings file / disabling it in about:config and moving on
                          • yes some features (mozilla experiments) aren’t really required (and I dislike them), but you can also disable them
                          • no it’s not on-topic to ask why they had stats enabled anyway, after receiving links to the why+how, this could have also happened with auto updates if not for the flag preventing http3 switching - it’s an http3 handling bug after all
                          1. 1

                            There are so many people that desperately want moz://a to be something it is never going to be because:

                            • They market themselves as being it
                            • There doesn’t seem to be any other contender

                            At this point though it’s just delusion to argue, they’ve killed servo and doubled down on things like pocket.. Either you customize it carefully and study every update or you just stop using it and move to nyxt (or something else that is trying to represent your interests).

                            1. 1

                              I don’t need to study every update. Some stuff comes up here or on HN early enough and otherwise I simply disable the default search + experiments, that’s it.

                      2. 10

                        It’s a list of questions-followed-by-definitive-statements, like “ None of these are critical to the point of being a web browser.” To dig into just one of those: I’ve already pointed out that, if you think cert revocation is not a critical part of a web browser, you don’t know enough about security to be critiquing one.

                      3. 2

                        My browser being dependent on decisions made at Mozilla and software running on Mozilla-controlled servers in order to work is itself a security problem.

                        1. 10

                          There will always be security problems. If you think Mozilla is a bigger security problem than the malware, phishing, compromised certs, etc. that many of these socket connections are there to help with, then we live in different realities.

                          Can’t you just check out your own copy of Firefox or Chromium and turn off all the parts you object to?

                      4. 14

                        That’s your opinion and you’re entitled to an opinion and I’m not on the internet to convince you. My opinion is different and I’ll spend a brief amount of time to explain why:

                        I believe many, if not all of those features ARE critical for web browsers. Fundamentally, your browser is not a document reader for hyper text. Firefox, like any other browser, is an application platform for multi-media apps, like Netflix, YouTube, Google Maps and what not. Whether you like it or not. For those to work well, you’ll need codecs, DRM, security updates and so on and so on. And I’m not even talking about PKI (which @snej pointed out beautifully. Thanks.).

                        1. 5

                          If I may add: Firefox gained popularity by being “better than IE”. That’s great when there’s only one browser you need to compete with. Firefox is probably still better than IE, but, you know, the ecosystem has evolved :-)

                      5. 2

                        There’s a lot more going on there than I’d have guessed, and many of those services are very desirable to me, so thanks for providing them. Nevertheless, it seems like the browser should maybe timeout on them and try to come back later. Of course you probably thought all that through already.

                        1. 5

                          It does in normal circumstances. That’s not the cause of the bug here; the cause was an infinite loop caused by a regular old bug.

                2. 3

                  Tangential and noob, but the discussion on a sibling thread made me wonder: how do smaller clients deal with cert revocation? Eg, curl? Since TLS is not restricted to browsers, I’d expect OSes to manage this by themselves on some layer. Is that correct? If so, is it desirable for browsers to duplicate those efforts?

                  I’m on my phone right now, but I’m pretty sure my local FF installation maintains its own list of root CAs. Come to think of it, I don’t know why…

                  1. 7

                    Most operating systems have a built-in root store that’s vetted by the operating system. Mozilla has its own root store and a root store policy, which is generally considered more restrictive and only used for HTTPS/WebPKI, whereas other OS root stores also have to deal with use cases like codesigning. Most clients rely on the OS root store, where Firefox does not (you can use some preference or an enterprise setting if you want to control your roots for your organization and make sure Firefox is aligned though). I actually do not know the exact backstory, but an overview of different OS root stores seems to be in this wikipedia article about CAs.

                    The root store is also a great service, such that most Linux distributions use a copy of the Mozilla certificate bundle, in Debian-based distributions, this is the “ca-certificates” package. I hope that answers some of your questions.

                    1. 4

                      This page on the Mozilla Wiki has busload of interesting links. https://wiki.mozilla.org/CA

                      1. 2

                        The OS usually has a root store, but AFAIK none handle revocation. In many cases having a browser bundle certs is useful, especially if your device isn’t getting updates and you can’t manually update the system root store. (See the Let’s Encrypt issues with old Android, Firefox users were never affected though IIRC they did find some solution to avoid breaking other browsers.

                    2. 2

                      Just another reason telemetry should be disabled by default

                      1. 7

                        I’m not sure I follow your logic since this bug was not specific to telemetry, definitely not specific to the functionality of telemetry, and in fact could have been triggered by a regular website even with telemetry disabled.

                        Can you elaborate?

                        1. 4

                          “Could have been triggered by a random website so (low) X% of users would have experienced it” vs

                          “WAS triggered by telemetry, so (high) Y% of user HAVE experienced it”.

                          No, it wasn’t specific, but in this case if a system in the critical path triggered it, it triggered for a lot of people.