1. 30
    1. 15

      I hope they look at torrents. People like myself are dying to find some socially-valuable way of using symmetric gig home fiber connections and abundant storage/compute in the homelab. I’ve tried hosting Linux ISOs but my sharing ratio never goes above 1. Torrents could be a first-line cache before hitting the S3 bucket or whatever else, and I think they have an extremely cool intersection with the idea of reproducible builds itself. Heck, you could configure your PC to seed the packages you’ve installed, which would have a nice feedback loop between package popularity and availability!

      1. 8

        Hmmm, looking at the cost breakdown they link, they use Fastly as a CDN/cache so most of their S3 costs are for storage, not transfer. They cite 30 TB of transfer a month out of the storage pool, vs 1500 TB of transfer per month served by the CDN. Looks like Fastly gives them the service for free (they estimate it would be €50k/month if they had to pay for it), so their bottleneck is authoritative storage.

        Backblaze would be cheaper for storage, but it’s still an improvement of like 50%, not 500%.

        1. 4

          Okay maybe this is getting a bit too architectural-astronaut, but why do you need authoritative storage for a reproducible build cache? If no peers are found then it’s your responsibility to build the package locally and start seeding it. Or there could even be feedback loops where users can request builds of a package that doesn’t have any seeders and other people who have set up some daemon on their server will have it pull down the build files, build it, and start seeding it. The last word in reproducible builds: make them so reproducible you can build & seed a torrent without downloading it from anybody else first!

          1. 8

            but why do you need authoritative storage for a reproducible build cache

            This thread isn’t about a reproducible build cache. It includes things like older source dists which aren’t on upstream anymore, in which case the cache isn’t reproducible anymore

          2. 4

            This seems highly unsecure to let random people populate the cache. It would be easy for a malicious actor to serve a different build.

            1. 5

              They could host an authoritative set of hashes

              1. 2

                This only works for content addresses derivations, which are a tiny minority. The rest of them are hashed on inputs, meaning that the output can change and the hash won’t, so what you propose wouldn’t work at the moment, not until CA derivations become the norm (and even then, it still wouldn’t work for old derivations)

            2. 3

              You can’t corrupt torrents, they’re addressed by their own file hash; see https://en.wikipedia.org/wiki/Magnet_URI_scheme

              1. 4

                It’s not corruption of the file in the torrent that’s the problem, but swapping in a malicious build output while the nix hash would stay the same. This is possible in the current build model (see my comment above), and what we rely on is trust in the cache’s signing key, which cannot be distributed.

                There are projects like trustix which try to address this, but they’re dormant as far as I can tell.

      2. 3

        I wouldn’t even bother trying to build NixOS in a CI/CD context with torrent as a primary storage backend.

        1. 4

          Why not? Would it be fine with you if http mirrors were still available?

          1. 5

            Maybe because there would be increased latency for every single file/archive accessed? The idea of a community-provided mesh cache is appealing though, if the latency issue is mitigated.

          2. 1

            Then I’d use only http, changing nothing for the project in terms of costs.

            CI/CD (and NixOS) should be reproducible, but having a requirement on torrents throws that out of the window and make it inpredictable. Yes, it will probably be fine most of the time, and sometimes be faster than everybody using the same HTTP cache.

            But also, firewalling the CI/CD pipeline using torrents? That’s hard enough with CDNs…

            1. 1

              There are different topologies for torrents. BitTorrent became popular for illegal file sharing (in spite of being a terrible design for that) and a lot of the bad reputation comes from that. In this model, all uploaders are ephemeral users, typically on residential connections. This introduces a load of failure modes. All seeders may go away before you finish downloading. One block may be present on only a slow seed that bottlenecks the entire network (Swarmcast was a much better design for avoiding this failure mode). Some seeders may be malicious and give you blocks slowly or corrupted.

              In contrast, this kind of use (which, as I understand it, was the use case for which the protocol was originally designed) assumes at least one ‘official’ seed on a decent high-speed link. In the worst case, you download from that seed and (modulo some small differences in protocol overhead), you’re in no worse a situation than if you were fetching over HTTP. At the same time, if a seed is available then you can reduce the load on the main server by fetching from there instead.

              For use in CI, you have exactly the same problems as any dependency where you don’t have a local cache (Ubuntu’s apt servers were down for a day a couple of months back and that really sucked for CI because they don’t appear to have automatic fail over and so the only way to get CI jobs that did apt updates to pass was to add a line to the scripts that patched the sources file with a different mirror). At the same time, it makes it trivial to provide a local cache: that’s the default behaviour for a BitTorrent client.

    2. 2

      Maybe they should look into https://wasabi.com/

    3. 1

      Cheaper S3 alternative first, like BackBlaze B2 or CloudFlare R2.

      Then something like a couple of beefy servers from hetzner or somesuch and self-hosting the S3 compatible service themselves.