1. 22
  1.  

  2. 3

    meta: I think you should have either linked to the “about” page, or added the release tag.

    1. 3

      Added release tag now, thanks for the hint. Still new around here :)

      1. 3

        Welcome and nice to meet you! :)

    2. 3

      Thank you for posting this, Michael. Very interesting!

      I am glad to see people who commit time and development in trying to advance the state of the art even for “plumbing” like package management.

      Do you have specific goals with Distri? E.g. do you think this will be adapted by another distro, or influence the design of other package managers?

      1. 3

        Glad you like it!

        Yeah, ideally other distributions and package managers would pick up ideas and run with them. If they don’t think this is worth it or realistic, hopefully at least newly built package managers will consider the architectural observations I’m providing here :)

        So if you know anyone working in Linux distributions who’d be up for championing such a change, or people working on package managers specifically who might be interested, please share! :)

        1. 1

          Cool!

          I’m afraid I don’t have any personal connections within this area.

          However, as you might know there are some smaller distros out there which seem more “agile” and less tied up by legacy. The ones that come to mind are Void and Alpine.

          So it might be worth reaching out to them.

      2. 2

        I like the reference to one of the most classic Dutch cannabis strains.

        1. 2

          Hey Michael – this is great! I work with Debian packaging at work and in my leisure time, and it really does feel like the tooling has not made a lot of progress. Stability can be a double-edged sword. I’ll check it out sometime :-)

          1. 2

            I like the focus on debug symbols! That was always a big mystery to me in Debian.

            And I learned that debug symbols are “optional” for Debian packagers. That shouldn’t surprise me, but I don’t see any reason why a good package shouldn’t have them. Although it is sort of awkward that debug symbol packages “pollute” the regular package namespace.

            Are there any distros that do better than Debian in that area? Fedora? Arch?

            1. 3

              Yeah, it has been a pain point for a long time, pretty much across the board :(

              I’m currently on Arch Linux, and I lack symbols all the time. In my experience, it’s worse than it was on Debian.

              In comparison, it’s easier to develop on distri, at least for me.

              Fedora has better tooling than Debian: their package manager has a subcommand to install the required packages (which seem to be broadly available), and (after confirmation) it can automatically configure the required repositories for you. I haven’t used Fedora in a while, so correct me if I’m wrong

              distri goes one step further, and makes debug infos and sources available by default, without any extra steps. Behind the scenes, this is done by automatically fetching the required SquashFS images from the repository (a static file HTTP server).

              distri’s overlay directories are available (without requiring the images themselves) to the debugfs and srcfs services via the repository metadata.

              Metadata is transferred in bandwidth-efficient gzip-compressed binary protobuf, as opposed to the XML and text based formats of other distributions. More importantly, metadata in distri is targeted to what really needs to be there, whereas other distributions often just have one type of metadata, an ever-growing grab bag of things.

              Targeted and wire-efficient representation are two low-hanguing fruit for many distributions. A lazy-loading read-only FUSE file system for debug and source packages should be a reasonable project to implement.

              Hopefully the other distributions pick up some of these goals :)

              Edit: forgot to mention: https://developers.redhat.com/blog/2019/10/14/introducing-debuginfod-the-elfutils-debuginfo-server/ also seems pretty cool

              1. 1

                So the debug info lives at a well-known path, but is lazily fetched? If so that makes a lot of sense.

                Does distri do any differential compression between versions of the same package? One downside of SquashFS is that you may lose some structure that could be useful for that.

                For example these pairs of images should all be very similar (I would guess 90-99%):

                • sources for Python 3.8 vs 3.9
                • binaries for Python 3.8 vs. 3.9
                • debug symbols for Python 3.8 vs 3.9

                and even more so for 3.9.0 vs 3.9.1.

                I forget if I mentioned that I tried (and failed) to write a binary-centric / hermetic package manager around 2014… And one thing that was important for my use case was package updates that are much more rapid than Debian. To prevent disk space from exploding, and to save on network time, I felt that differential compression was important.

                It’s a long story, but an important use case was running R packages, which move extremely quickly – much faster than distros.


                Actually I could just copy from a conversation I had with @ac last year, who at the time was also working on a binary-centric package manager like Nix.

                Here are some more concrete examples of the problem I would describe as “apps are pyramids with big shared bases”.

                • I was dealing with 30 or so R apps, R packages, and R itself. The R code is 500 lines, but the whole app bundle is 500 MB x 30 apps.
                • tiny scripts depending on Python, Pandas, NumPy, which are large. Another recent huge dependency is TensorFlow
                • Compilers using LLVM: Clang, Rust, Julia, etc.
                • Apps using Electron: VSCode, Atom, etc. Slack I think
                • All dynamic web stacks: Python and Django, Ruby and Rails, etc.
                • GUI apps and associated frameworks. Actually I believe this is why dynamic linking was invented in Unix.

                So another way to think of it is that I think you should be able to install like 10 versions of Clang and Rust and Julia on the same machine, and not have 30x the space of LLVM. It would probably be north of 30 GiB, and you would pay that as disk and network space.

                So anyway I’m not sure if this is in your design goals for distri, but it’s a problem I have had in the back of my mind. I think fine-grained versions are useful to develop and deploy software quickly, but it gets expensive

                Oil was definitely motivated by distros, e.g. the relatively bad mish-mash of languages and macro-processing that distros use to express their package configuration:

                http://www.oilshell.org/blog/tags.html?tag=linux-distro#linux-distro

                1. 1

                  So the debug info lives at a well-known path, but is lazily fetched? If so that makes a lot of sense.

                  That’s correct!

                  Does distri do any differential compression between versions of the same package? One downside of SquashFS is that you may lose some structure that could be useful for that.

                  Not right now, and it’s not something that’s on my list either. I wanted to give http://zsync.moria.org.uk/ a shot for differential compression for the download step, but haven’t tried it out yet.

                  In practice, the large disks we’re used to nowadays, and the fact that most packages are present in one version only (with a few exceptions), make this largely a non-issue in my day-to-day.

                  I appreciate your description, though, and it sounds like in your environment differential compression is a lot more useful!

                  1. 1

                    OK interesting, I didn’t know about zsync. It seems interesting so I just downloaded it and built it. It passes a couple tests, but it doesn’t seem like a mature project otherwise (i.e. does anyone use it in production?)

                    Although you need to generate a .zsync delta between pairs, I think it could mostly work if you go on the assumption that most people will be at the latest versions. Package versions could be kept forever, while you would need say 3*N zsync deltas for N old versions and 3 new versions. So it will scale linearly rather than quadratically.

                    I like that it works with a plain HTTP server.


                    I have a fairly concrete use case I could try it out with: Oil’s continuous builds which are currently on Travis, but eventually need to be ported to other platforms for non-Ubuntu builds.

                    The dev dependencies are big and need to be sync’d every time.

                    Someone contributed Nix support, which is not fast: https://github.com/oilshell/oil/issues/513

                    But this doesn’t pass tests now, because the package versions are different than Ubuntu, which I develop on. And it doesn’t seem easy to create Nix packages. I already have shell scripts that build the right versions of all my dev dependencies, but that’s VERY far from a Nix package (while I don’t think it would be too far from an Arch package).

                    Nix seems to require a lot of weird patches to get packages to work, and that compounds as you move “up”. I also have many Python dependencies, and the contributor didn’t have a real idea about how to tackle that in Nix.

                    It’s mostly because of the /nix/store thing I believe. I think FUSE probably allows you to avoid too many upstream changes. I think mostly relying on --prefix is a good idea.

                    So what I end up doing is using cache: feature in .travis.yml to avoid 10 minutes of building dependencies before every continuous build.

                    That mostly works fine, although I ran into a bug a couple weeks ago. Builds were incorrectly failing, and had to turn off the cache for a few builds, and then turn it on. In my experience, that’s not surprising with ad hoc cache mechanisms. It works OK because the dev dependencies don’t change very much, but if they change, it would be a bigger hassle. You have to manually delete the cache with the travis command line tool.


                    Anyway, long story short is that I think continuous builds are a good use case for a performance-oriented distro. The way Travis works is that all builds start from a clean slate, and I think they have a lot of one-off local caches in AWS of Debian, Python’s PIP, node.js to make it reasonably fast. I think it works fine because it’s centralized in an AWS data center, with fast networking. I imagine the bugs I ran into were probably some production issue about migrating clusters – i.e. the cache state wasn’t correct.

                    But I think it would be nicer for a distro to support this use case out of the box – booting a known set of dependencies from scratch for a fast build. And it shouldn’t rely on running in the cloud close to caches.

                    Since I alerady use the oilshell.org static web server to fetch tarball sources when the cache doesn’t exist, it seems like it could be easy to plug in zsync! So there is a path to optionally trying it out.

                    And from there I could port off of Travis onto other platforms. Anyway I’m on the distri mailing list, so if I ever actually try this, I can report back some results :)

                    (FWIW here is the site with builds: http://travis-ci.oilshell.org/jobs/ , it’s doing a lot of work now, which I’m happy with)


                    edit: although it occurs to me now that FUSE is not a good dependency for a lot of continuous build platforms, because of kernel support… hm I will have to think about this. Right now I only have one “layer” of dependencies really. That is, I just avoid make install and run all my devtool binaries out of the source dir. But if there are transitive dependencies then you need the equivalent of make install.

                    Filed bug to keep track of it, not a very high priority: https://github.com/oilshell/oil/issues/756