1. 30
  1.  

  2. 10

    It makes me unreasonably happy that Wikipedia hasn’t succumbed to the trend of using Cloudflare, Cloudfront, or one of the other huge CDNs.

    Also, just for fun, I used curl -I to find out what headers Wikipedia returns for a successful request. The returned headers include a GeoIP cookie that did a pretty good job of identifying the region I’m in, including the system’s guess at my country, state, city, and approximate latitude and longitude. I wonder how much it costs Wikipedia to get that information.

    1. 4

      Indeed. I’m definitely looking forward to the next installments, especially information about Apache Traffic Server (and potentially alternatives considered). Everyone seems happy offloading their CDN workloads to other companies, so I haven’t seen much public content about running your own. How many people really need to pay some company for access to 200+ PoPs and all that fanciness? Clearly not Wikipedia.

      Re: location data, it looks like MaxMind offers databases at that granularity for $100 a month. IANAL but my reading of the licensing info seems like Wikipedia would not need a commercial license, and could use the MaxMind database at that $100/mo price point:

      you may use Geolocation Functionality to customize and target your own ads for your own products and services, surveys, and other content but may not use Geolocation Functionality in connection with a service that customizes or targets any content on behalf of your customers, users, or any third party

      1. 5

        location data, it looks like MaxMind offers databases at that granularity for $100 a month

        That is correct. We also use the netspeed stuff, so that’s ~190 USD a month. See our maxmind puppet configuration and the documentation by the Analytics team for details about what the information is used for.

    2. 3

      Realize you can’t possibly answer about everything (while doing the job itself!), but the peek at the architecture makes me curious about adjacent stuff–hardware and what Wikipedia’s particular load looks like to CDN. Just to fire off random questions:

      Does a PoP have relatively cheap boxes or fewer bigger ones? Is a PoP server’s network/disk/CPU/RAM balance far off from a typical app server’s? Is the filesystem layer SSD or HDD? (Would very weakly bet on lower-end SSD, e.g. SATA: no more worries about IOPS, but cheap for an SSD.) Is the size public for any of the PoPs?

      Also, given that images, etc. tend to be larger than text but easier in other ways (e.g. they don’t normally need to expire quickly), I wonder how much your cost/complexity is driven by big media files (needing huge storage, etc.) vs. articles (needing more origin fetches?). (That’s not quite even a well-formed question.) I also wonder about how those 10% of uncacheable hits break down, e.g. relaying logged-in users uncacheable pages vs. actual long-tail article fetches.

      Again, I don’t really expect answers, much less complete ones. I hope you at least take the peppering of questions as an indication people find all this stuff interesting. :) And of course much appreciation for what you’re working for as well!

      1. 4

        Is the filesystem layer SSD or HDD?

        We’ve got a mix of cheap SSDs for the OS and good NVMes (Samsung PM1725a/PM1725b) for the on-disk cache. See https://wikitech.wikimedia.org/wiki/Traffic_cache_hardware for the details. That page should indirectly answer some of your qualitative questions too.

        Is the size public for any of the PoPs?

        Pretty much everything is public. :) On-disk caches are 1.6T per host, see for instance ATS cache usage on this Amsterdam node. We have 16 servers per PoP except for San Francisco and Singapore (12).

        given that images, etc. tend to be larger than text but easier in other ways (e.g. they don’t normally need to expire quickly), I wonder how much your cost/complexity is driven by big media files (needing huge storage, etc.) vs. articles (needing more origin fetches?).

        Very good question, I’ll keep it in mind for the next article. In brief: we do have two logically distinct cache clusters, one for larger files like images and videos and another for everything else, including html/css/js and the like. The former is called “upload”, the latter “text”. Their VCL configuration is slightly different, see upload vs text, but most importantly the in-memory frontend caches are kept separate given the different access/expiration patterns you’ve mentioned.

        I also wonder about how those 10% of uncacheable hits break down, e.g. relaying logged-in users uncacheable pages vs. actual long-tail article fetches.I also wonder about how those 10% of uncacheable hits break down, e.g. relaying logged-in users uncacheable pages vs. actual long-tail article fetches.

        Big difference between text and upload. Traffic for logged-in users, as you guessed, isn’t cacheable and forms the bulk of the ~7% “pass” you see in the breakdown here. When it comes to upload, instead, the hitrate is as high as ~96%.

        I hope you at least take the peppering of questions as an indication people find all this stuff interesting

        This is very useful feedback for the next article, thank you!

        1. 2

          Thank you! I would have guessed, from the sheer number of views WP gets, that each PoP would need an even bigger pipe than you could fill with 10GbE from 12-16 cache nodes, but poking at the public Grafana for the SFO PoP, it looks like you’re actually plenty well provisioned on that front. Neat!

      2. 2

        Looking forward to subsequent articles in this series (on the specifics of the Apache Traffic Server migration). Very cool!

        1. 1

          The big question for me is why they updated Varnish if they had a working setup with Varnish v3 and they knew that their preferred backend was moved to the proprietary version in v4.

          1. 9

            Hey! We had to upgrade because v3 wasn’t supported anymore by the Varnish development team, so no more bugfixes. Being a team of 2 we surely did not have the capacity to maintain a Varnish fork, on top of all other things. :)

            1. 1

              Sure, I get that, but it had been working for years. Were you seeing any vulnerabilities or bugs?

              1. 7

                Frequently, yes. https://phabricator.wikimedia.org/T133866 is just one example, but if you dig into our phab you’ll find plenty more! Plus of course the idea is that when a security vulnerability is discovered you want to already be running the supported version. Take into account that upgrading from v3 to v4 was a project that took many months and involved porting hundreds of lines of VCL code, it wasn’t a matter of apt dist-upgrade.

                1. 2

                  Commiserations, that does sound like a bit of a pain. I hope that the new system serves you well :)

                2. 2

                  Personally, I’ve seen Varnish segfault on non-malicious input more than once. Given that, I think it’s implausible to hope that serious security bugs won’t sometimes turn up.

              2. 2

                Varnish v4 was released in 2014 and v3 went EoL a year after that in 2015. It hasn’t had any security patches etc from upstream since then.

                1. 2

                  Varnish v4 was released in 2014 and v3 went EoL a year after that in 2015

                  Correct, and we upgraded in 2016 (one year too late!).

                  It hasn’t had any security patches etc from upstream since then.

                  Right, if it’s unsupported, upstream does not provide fixes.

              3. 1

                The article mentions Direct Routing, which I’ve used in the past (Direct Server Return’s basically the same, IIRC) to great success with Linux.

                Is there any good way to implement DR/DSR with FreeBSD or OpenBSD?