1. 19
  1.  

  2. 8

    “ To quote publicly available data, by 2020, we had around 2000 engineers with 20M lines of hand-written code (10x more including the generated code) in the monorepo alone, ”

    Every time I read stats like this I think - surely there must be a better way to write software!

    1. 7

      I agree! I hear a lot of good things about twitter’s culture (before Musk took over, that is). A kernel team, a culture of excellence, etc. But honestly, the actual service they offer is hosting a bunch of tweets, pictures, and videos. Their site fails to work very regularly.

      Surely there must be some challenges with scaling, but from the outside it just seems very crappy.

      1. 21

        I haven’t experienced a significant Twitter outage in years.

        I think you underestimate the challenges of running a near real-time many-to-many messaging service, to be honest.

        1. 12

          I think you underestimate the challenges of running a near real-time many-to-many messaging service, to be honest.

          And an analytics platform, a (very) high-volume data API platform for data customers, multiple search systems (some are paid-access only), an advertising platform, a content moderation system (across many countries, in compliance with local laws), probably a copyright infringement system, anti-abuse systems, internal monitoring and reporting, mobile applications, internationalization and localization systems, …

          People have this incredibly reductive view of what Twitter, in particular, actually does, that I find really frustrating. It’s not simple.

          1. 4

            People have this incredibly reductive view of what Twitter, in particular, actually does

            I have a vague memory of, possibly, a JWZ thing where he points out that browsers are huge now because, whilst you only use 10% of the functionality, B uses a different 10%, C uses a different 10% again, etc., and that leads to a complex system which is, by necessity, big.

            (But I cannot currently find the article / rant itself.)

          2. 4

            Nothing should require 20 million lines of code to accomplish.

            1. 8

              Why not?

              1. 2

                Because small things are exponentially easier to work with and understand than large things.

            2. 3

              Not an outage but we’ve all experienced breakage.

              1. 1

                I’m starting to have weird issues on my “Latest Tweets” timeline since a few days (on mobile).

                1. 1

                  I am excluding issues after the Musk takeover.

            3. 5

              So happy you said this. It seems like FAANGs get praised for their scale, when really it’s a completely pathological case of inefficiency that requires that many engineers to begin with. There is a better way, we can’t give up on that.

              What’s interesting is that works out to 10,000 LOC per engineer, which doesn’t sound like much but realistically how much code can a single human actually comprehend? LOC is not useful in many ways, but there is certainly an upper bound on how many LOC a human brain can reasonably work with.

              1. 5

                You can definitely write something that provides similar functionality in much fewer lines of code. I guarantee you won’t enjoy the on-call rotations, though.

                1. 1

                  This is confusing - are you saying that more lines of code implies better maintainability and reliability? That would go against any study where bugs are found to be very directly related to lines of code.

                  What I’m saying is that there’s an upper bound on physically how much code a human being can physically handle, and knowing what that limit is would be a good thing. I’m not suggesting that we play cos golf, but we should learn how to more efficiently use groups of people.

                  1. 1

                    It’s nothing to do with bugs per line of code. It’s how automated your procedures are for interfacing with the inherent complexity of hosting things in the real world. I’ve spent some years inside Azure dealing with this - the amount of effort it took to turn building & bringing online a new datacenter from a highly manual process to an even partially automated process was staggering.

                    1. 1

                      I see. Sure, if you want to solve problems, that comes with added logic. My criticism of large companies is that they can afford to have hoards (by which I mean ~tens of thousands) of humans add this logic at the abstraction level of our current tools, which hides the fundamental issue: I wish we’d be able to do equal or more with way less effort.

                      I understand that sounds pie-in-the-sky, but I’ve been at least experimenting with model-driven code generation a lot, and it feels slightly promising. Essential complexity can’t be avoided, but how much of the Azure datacenter code is essential?

                2. 2

                  We just had a post from someone who has a game with 58,000 LOC so 10,000 is likely to small: https://lobste.rs/s/lsspr7/porting_58000_lines_d_c_jai_part_0_why_how

                  1. 1

                    Sure, but it’s very interesting that it’s still in the same order of magnitude. It’s also very likely that individual productivity goes down on a multi-person team, because of the coordination necessary between people, and also because you have to work within parameters that you didn’t create.

                3. 4

                  I think the first step is to understand why it’s 20M lines in the first place. Is it lots of test code? Sad-path handling? Boilerplate? Features? Regulatory compliance? Maybe most of it actually is necessary!

                  1. 1

                    They had a kernel team and their own linux fork. I would bet on them having multiple MLoC of other forked / vendored deps too.

                  2. 4

                    It seems so, but I think it’s largely an illusion. Of course at 20M there’s probably a few M lines of code that could be cut, but I don’t think you could easily reduce it by an order of magnitude without losing something important.

                    Just like institutions grow bureaucratic processes out of past pains and failures, they grow code out of past pains and reaching breaking points with simple or out-of-the-box solutions.

                    For example, the original Twitter may have used off-the-shelf memcached, but grew to the point that its limitations around cache fragmentation and eviction strategies that don’t matter for most users, did matter to Twitter. Suddenly it’s a big step: from “just install memcached” it becomes “develop and maintain a custom cache cluster tuned to your workload”. Repeat that a few times, and you have millions lines of code that seem like they could be replaced with “simple” solutions, but actually everything would fall over if you tried.

                    Apart from scalability/performance, also resilience is a code multiplier. You not only develop feature X, but also error handling, lots of error handling, automated tests, healthchecks, fallbacks for every failure scenario, error handling for the fallbacks, rate limiting and circuit breakers for errors to avoid cascading failures, monitoring and logging, aggregation and alerting for the monitoring, and supporting infra for all of the extra code and tooling.

                    1. 3

                      There is definitely an aspect of Dunning-Kruger to some of the complaints, primarily driven by a lack of understanding of the sheer scale of Twitter, it’s backend (Twitter wasn’t all GCP (mostly analytics data) but largely, especially on the frontend, on-premises data centre hardware) and all of the analytics tools, machine learning crap, axillary code and such that comes with it. Anyone can write a lookalike Twitter frontend, backend given enough time. At the scale of actual Twitter comes a lot of hidden work.