1. 3

    I really want to know what their video processing pipeline is like since they generate clips and varying video quality levels for what I assume is every device in existence. There were some nice nuggets here. I didn’t know about the Beacon api or the intersection observer. Seems like a mostly boring stack but considering they’ve been around for about 10 years and the site hasn’t slowed to a crawl on my intentionally crappy test laptop it means they’re doing something right.

    Did anyone pick up on whether they’re running all of their infra on AWS or just the vertica part? I thought the bandwidth costs would be killer.

    1. 4

      Why would they need to generate so many different quality levels? They probably just have 2 or 3 which is enough to cover most devices out there. Using ffmepg it’s trivial to generate these videos, though you need the infrastructure and processing power behind it.

      1. 2

        When you do it live, constantly, on terabytes of data, the infrastructure and processing power become the big problems.

        Edit: upon rereading it, they actually sound like they put a big emphasis on quality and compatibility too. So their question is, “if we can we make this content incrementally better for X market segment, is it worth it?” Start from the biggest X’s and work your way down like any other priority list!

        1. 2

          There’s absolutely no way they’d do live transcoding; these sites usually only have two versions, it’d be much cheaper to simply store both at all times.

          It’s actually a very simple thought experiment — you obviously cannot re-create the high-res version from the low-res one, and the low-res one would take so little space in storage compared to high-res one, that spending minutes trying re-create it from the high-res one would simply make very little sense — they’re probably transcoded once on upload, and pretty much forever cached.

          BTW, I’d suggest you read the DDIA book, which explains a lot of these things. It has many insights into how actual popular applications are designed nowadays, including the actual Twitter implementation — which answered my own question on why it often takes so long to post a Tweet.

          1. 2

            They might only have two versions from your perspective (SD and HD), but having worked in video development, it’s likely they have 3-4 x those two versions for compatability. The web has converged on a few technologies in the last few years, making it less cumbersome, but if they want to cover “most” devices, then I still expect them to have at least 2-3 sets of files.

          2. 1

            Do you think they do live transcoding? I’m certain they have multiple copies of the media transcoded to different qualities. It’s really not that much processing power when you have things like Ryzen boxes and GPUs which can rip through this in no time.

          3. 2

            At this point, they almost certainly don’t. But in the not too distant past, they would have had to have a multiplicity of encodings, because of the varying abilities of the various browsers/devices/codecs.

          4. 3

            This is tangential, but I have really enjoyed learning about how netflix handles encoding and processing their videos.

            Although Pornhub must process much more video than netflix does. I wonder what trade offs PH makes compared to Netflix’s approach based soley on the amount of content they have.

            Here is a brief article from the Netflix Engineering blog about encoding. But I first started thinking about it when I watched this system design video from Gaurav Sen.

            1. 2

              Although Pornhub must process much more video than netflix does

              Are you sure about this? I don’t remember where I read it, but I’m sure at some point I read that one of the adult sites (likely this one) determined that most viewing behaviour is to watch a bit at the beginning, and then skip forward to about 80% of the way through the video. The consumption of Netflix [I’m guessing] would look very different, i.e., watching a film start to finish.

              I would have thought that this site could optimise videos for certain behavioural patterns.

            2. 3

              Self hosted, I’ve seen their servers in the datacenter.

              Porn industry giants usually self-host as much as possible.

              1. 3

                Self-hosted using Level 3 as the network provider per Rusty.

              2. 2

                Although idk about processing, I do remember that Rusty said in Reddit AMA that they use Limelight for video CDN.

              1. 2

                This was really cool! Sounds like the team has a lot of empathy for users, and is able to leverage their technical knowledge in a way that actually helps those users, without getting lost in the weeds of over complicating the solution. It’s also funny that this would even be something out of the ordinary, or worth mentioning, but I’ve seen so many smart teams completely miss the mark when it comes to solving challenges like this. Often the “serious” programmers would consider this the UI/UX folks’ problem, and that a mathematical solution would be too much, or the mathematical solution wouldn’t take into account the end user’s experience and frustration.

                I thought the article was really easy to read also (coming from someone who is interested in technical/algorithmic solutions to problems but lacks the background to really grok most articles I see like this). So kudos to the author, and thank you!