1. 10
    1. 1

      In the last year PyPI served 235.7 billion downloads for the 448,941 projects hosted there.

      This is an impressive number, yet feels a bit wrong. Why do people have to download the same package over and over again? Is it because of CI setting the project anew and then running tests?

      1. 3

        Seems likely, the average CI requires additional actions to cache the dependencies step so I’d expect that to only be done if it’s a notable performance or reliability hit, or it’s somebody’s bugbear in the community and they straight up go around proposing CI fixes.

        Recently on reddit there was a user complaining that a minor lint step was their largest CI job at several minutes, turns out they were using a gigabyte-scale uberlint image, to run just one of the dozens of linters the image includes.

        They only realised when one commenter checked and saw the lint itself took just seconds and it was all the setup which took ages, and an other posted their surprise about a 1.4GB docker image to run a 3kLOC lint tool.

        1. 2

          Wow, this neatly exemplifies everything that’s wrong about modern development…

          1. 2

            This morning I noticed that a house ad on our site was a shocking 4MB large. It was fast for the person who uploaded it, so why would they notice? 😣 Now on my todo list is to add a little box next to the house ad picker that warns them about the size of the ad.

            But everything is like this. O(N^2) algorithms are everywhere because they’re fast up until they meet a real world amount of data. The only way to combat this is to at every step do a benchmark to make sure things are in the right ballpark, but that’s just not realistic.

            1. 1

              I saw a remark recently - I’m afraid I don’t remember the source - that O(n^2) is kind of the sweet spot for inefficient code. It’s fast enough that developers testing it on smallish inputs will leave it in, but slow enough to cause real problems.