1. 7
  1. 4

    I am seeing some chatter on Twitter about Century Link / Level3 advertising bad BGP routes. But I always like to see the source. Anyone know where that discussion happens? (I.e., network operators working between AS’es?)

    1. 8

      The NANOG mailing list is my go-to.

      1. 2

        Thank you, that’s a gold mine.

    2. 2

      Just to update this post, the problem was CenturyLink. They did “something” wrong with their BGP reflectors. Dunno if they will post the full details.

      1. 2

        “We are able to confirm that all services impacted by today’s IP outage have been restored. We understand how important these services are to our customers, and we sincerely apologize for the impact this outage caused.”

        https://twitter.com/CenturyLink/status/1300089110858797063

        1. 2

          (Full disclosure, I work for Akamai) I appreciate how open Cloudflare is about their incidents. Anyone who uses them, do you get direct contact about mitigation, or is it through their status site?

          1. 1

            I have their cheapest plan - no notifications. I don’t particularly expect them at $20/mo though.

          2. 2

            Maybe It’s time to say goodbye to Cloudflare… Everytime there is a hiccup in their service half the internet goes down.

            1. 11

              I don’t think the root cause is Cloudflare, just a symptom of a larger problem. Some of my coworkers couldn’t route to our production IPs.

              1. 4

                Yeah, it’s almost certainly not Cloudflare. More than likely it’s issues with one of the ISPs they peer with, probably – as mentioned above – L3

                1. 2

                  Whether the actual fault is cloudflare’s or some underlying thing they depend on, OP still has a point: they are a single point of failure/control for a vast part of the internet.

                  1. 6

                    I don’t think “some underlying thing they rely on” adequately describes the role of tier 1 ISPs in general and of their role in BGP in particular. These major problems almost always tie back to deployment of improper BGP configurations somewhere, the best information I’ve seen so far (https://mailman.nanog.org/pipermail/nanog/2020-August/209382.html) points to problems with the BGP configuration for one of CenturyLink’s ASNs (the timing of the outage and its nature are also soft indications that this was caused by deployment of a new BGP configuration.).

                    The ongoing problems with BGP are far more extensive than concentration of capacity in a single provider (such as Cloudflare). It seems as though even major ISPs can’t adequately manage BGP configurations or recover from errors within reasonable time frames. This doesn’t appear to be a problem with any given provider, or even a general over-centralization of the networks of The Internet but rather either an issue with BGP, its implementation in actual networks, or the lack of a sufficiently large technical cadre that can reliably work with the protocol. Given the history of BGP based problematic events I suspect that it’s all three.