1. 2

    Am I the only one that thinks that all these netflix things are extremely over-engineered? The bulk of their content is not even served from AWS, but from boxes that are close to the eyeballs.

    I am not saying, I could build one in a weekend or anything like it, but what do all these servers do? There is hardly any user interaction, except search and maybe giving a rating. The search is also not that big, given the size of the catalog they serve per country. The traffic comes from local caches. What is all this for, except keeping engineers in the bay area busy?

    1. 19

      just a psa, I don’t and have never worked for Netflix, all of this is mostly conjecture from experience.

      sure, I think that micro service bloat is probably a problem that they have. and many of the FANG companies suffer from NIH (not invented here syndrome), in some cases because of (IMO) broken promotion processes that require engineers to ship “impactful” work at all costs, and in others just because they have an unlimited amount of money to spend on engineering time.

      That being said, even the most trivial problems become quite difficult at the scale that they’re working at – they have 125 million subscribers worldwide, which means peak time is almost all of the time. In addition, maybe you only use search and ratings, but what about admin UI’s? What do customer service teams use? What tooling do content creators use to get materials onto their platform, and what do they use to monitor metrics for content once it’s uploaded? What about ML and BI concerns, SOC2 concerns, GDPR concerns? I could go on forever perhaps. It’s very difficult to reconstruct all of the reasons for the way any platform evolved the way it did without getting a historical architecture overview. But! Their service is very reliable and their business is profitable, so they must be doing something right. (not that there isn’t always room for improvement)

      1. 15

        There was a good presentation at StrangeLoop last year: Antics, Drift, and Chaos. The short version is “Netflix is a monitoring company that, as an interesting and unexpected byproduct, also streams movies.”

        1. 1

          this is great! thanks for the link – I’ve got to get to strangeloop next year.

          1. 1

            What kind of monitoring do they do, do you know?

            1. 3

              We use Atlas for monitoring.

          2. 1

            The result and the press is not as important as the journey. Being able to failover that quickly such a huge infrastructure is impressive, but the most important part is how they managed to achieve this and improve their work-flow, resiliency, and many other things along the way!

            1. 1

              I assume these other boxes are Very Important^TM for authorization and provides the search/indexing functionality of their service. The CDN boxes they ship out do nothing but host the videos, and not all videos exist on each box, so something would have to handle directing you to the correct node.

              You can’t stream the videos if you can’t get authorization, so…

              1. 1

                Those boxes they ship to ISPs only hold a subset of content. They still have to deal with routing a request to the closest node with the content they want, and update the ISP cache box with that content when there’s a spike in demand for something that isn’t cached locally. If your AWS nodes are down and nobody on the ISP requested Star Trek in the last N hours, you’re up shit creek with the customer requesting it unless you have a good fail over strategy.

                I doubt those ISP cache nodes do local authentication or billing, either.

                1. 1

                  Do you know where the movie content lives though? I’d be surprised if any of it was served from AWS hosts, instead I’d expect it on a CDN somewhere. I don’t think @fs111 is saying that Netflix doesn’t do anything, but rather does their architecture actually make sense given what they do?

                  My two cents is that it is probably overengineered and that is probably because it happened organically because nobody really knew what they were doing. With hindsight we could probably say some things are needed or could be done simpler.

                  1. 2

                    The video content, at least as of a couple of years ago, is encoded by EC2 instances into a bunch of qualities/formats (some on demand, I believe?), which live in S3 and are shuttled to around to various ISP cache nodes as needed.

                    Netflix doesn’t use a CDN, they are a CDN.