1. 23
    1. 3

      Absolutely excellent article!

      It discusses alternative (to Nix) way of hermetic building of Linux OS trees.

      There is one thing that feels wrong in this article, namely this:

      This uses fakeroot and bubblewrap to sandbox the command so that it can’t access anything outside of the input tree. Bubblewrap is a tool born from Red Hat’s Project Atomic, and used by Flatpak among others, that allows unprivileged users to create secure sandboxes. Here bubblewrap is not used for security, but as a convenient way of ensuring correct, “hermetic” builds. Our version of fakeroot is heavily patched so that the build command sees the file permissions that are stored by OSTree; this allows us to run the build as an unprivileged user but still modify root-owned files

      This feels very hacky. “fakeroot” is very hacky tool. Go read its “LIMITATIONS” and “BUGS” in man: https://manpages.debian.org/bullseye/fakeroot/fakeroot.1.en.html . So I propose something else instead. For example:

      • just bubblewrap without fakeroot (bubblewrap has options for switching UIDs)
      • unshare -r

      This variants can be combined with UID mapping and/or mount point ID mapping.

      P. S. I posted this article despite it already been posted before, because it is really good. Also I added tags, which was not added in that time

      P. P. S. I added “nix”, because this story presents Nix’s alternative. I added “vcs”, because it is somehow VCS-related. I added “virtualization”, because this is about hermetic builds, this is strongly related

    2. 1

      I like this comment: https://lwn.net/Articles/821765/ . It essentially “proves” that Nix is wrong and that we need something different

      1. 2

        There’s absolutely no “proving” going on here. He starts off saying:

        I can’t speak about Nix with any authority because I haven’t used it

        Hah, not having used a system is ALWAYS a “good” way to start out with criticisms of it! Which is pretty clear because later he states some things that are either blatantly inaccurate or outright falsities:

        it serialises the entire directory including all its files’ contents (like tar) and then calculates a checksum of this.

        That’s false. It calculates a checksum of the checksums of the inputs, not their contents, AFAIK. A subtle difference that is nevertheless less stupid-sounding. No need to throw nonexistent shade at a good technology, you see.

        Secondly, the author claims that there’s no sharing or deduplication of individual files across different packages. This is not entirely accurate. While it’s true that Nix stores each build output in its unique directory identified by its hash, there are mechanisms at the file system level (like hard links) that can help reduce the storage cost of duplicate files, which you can enable with a simple configuration change (and running a process on a schedule). Additionally, Nix’s binary cache mechanism allows shared use of built binaries among users, which reduces the need for duplicate builds.

        “It also means that you can’t share/reuse any work done by intermediate build steps within a package”

        This isn’t universally accurate. Some advanced users of Nix do exploit its capabilities to cache and reuse intermediate build artifacts.

        While the author’s point about deep constructive traces from the “Build Systems à la Carte” paper is valid, it’s essential to understand that Nix’s approach has the advantage of ensuring reproducibility. The potential disadvantage of not having early cutoff is a trade-off for this reproducibility.

        The reference to Chapter 6 of the Nix thesis regarding content addressability is outdated. Nix has since evolved (see: https://www.tweag.io/blog/2020-09-10-nix-cas/), and content-addressed storage (CAS) is now a part of Nix. This means that Nix can identify outputs based on their content, not just their derivation. This adds another level of potential sharing and deduplication. (Granted, the criticism was written in 2020, and CAS is, as the article I linked, which was also from 2020, suggested, relatively new.)

        Anyway, we don’t need something different. What we need is for people to understand that if you want reproducible builds that don’t conflict with anything else that is already installed, there’s simply no other way around it than declaring every dependency explicitly and (using hashes) uniquely. Which is to say, treating every build like a pure function with inputs and outputs. Which is exactly what Nix does.

        When something comes along that achieves that BUT is easier to use, then you might have an argument. Right now you have nothing.

        1. 2

          I didn’t use Nix, either, so I have little to say. Still, I have a question.

          The potential disadvantage of not having early cutoff is a trade-off for this reproducibility.

          How early cutoff conflicts with reproducibility?

          1. 2

            Good question, and upon doing a bit of googling I think I wasn’t sufficiently clear. If the hashes of the inputs are the same as a previous build, and the result of that previous build is in the binary cache, Nix can skip building the package and pull down the cached result instead. This is essentially “early cutoff”, although not technically the same thing.

            1. 2

              (It seems you completely rewrote your message. I. e. I got one version to my email and I saw different version on the site.) (Okay, I deleted email version without reading)

              The very same article you just gave to me ( https://www.tweag.io/blog/2020-09-10-nix-cas/ ) says that content-addressable store solves cutoff problem. So it seems it is solved. Cool!

      2. 2

        In addition to the sibling comment, it’s worth noting that the linked rant is specific to C linkage and C-oriented compilation. With effort, it’s possible to do incremental module-at-a-time compilation with Nix; here is how I did it for Monte.

    3. 1

      One thing I find odd is that someone there mentioned having ostree take about a day to garbage collect. It’s strictly acyclic. Could they have refcounts? :)

Stories with similar links:

  1. Merkle trees and build systems via varjag 3 years ago | 18 points | 5 comments