1. 24
  1.  

  2. 18

    Packaging software that has such crazy dependency graphs and giving in to “vendoring” would be rewarding developers for building throw-away software.

    One does not build stable software that has to last 3 to 5 years in this manner as the costs of “vendoring” would be very high, not just for distributions, but especially for upstream. All of a sudden they will have to deal with potentially 100s of dependencies that are either abandoned, have serious security issues, got upgraded and broke API, or got replaced by other dependencies altogether. Unsustainable…

    Better let upstream wrap this software in a Flatpak/Snap/Docker and call it a day. In my book this is the clear cut-off for software that should not by relied on for production environments as it will become a day job to nurse these kinds of applications. Before you know it you need “devops” for your Kubernetes cluster for something that should be an apt install away from running for the next 3 to 5 years without any oversight…

    1. 33

      ripgrep isn’t “throw away” software. I’ve been maintaining it for 5.5 years already. It has on the order of ~80 dependencies from the Rust ecosystem. I challenge you or anyone to propose how I could reduce that number in a way that’s both practical and sustainable.

      Many of those dependencies I have myself created, so that others can reuse that code. And that’s not just some vacuous appeal to code reuse. Several things that were once inside of ripgrep core have since been factored out into their own projects and are now used inside Cargo and rustc itself, among other popular programs such as tokei and fd. And this pressure to factor out code into distinct dependencies applies recursively. For example, the regex crate (which I maintain) factors things out into distinct crates such as the regex parser and Aho-Corasick, both of which are used independently by others in the ecosystem.

      Comparable C (or C++) programs either don’t provide the same functionality as ripgrep or re-implement these things on their own, often with many bugs.

      1. 11

        It’s not just about software with crazy dependency graphs. A while ago someone wanted to package my project with 40 dependencies (+ 21 used only for tests and lints) and they ran in to problems too.

        For example, I split some parts out to their own repo/package, just because that’s a bit easier to work with for me, and because at some point in the future it might be useful for others, too. But none of this is semver; and why should it be? Introducing it would greatly increase the maintenance burden for me with no benefit for anyone.

        In some other cases I use forks of libraries to patch some specific issue; sometimes the PR/patch just doesn’t get merged for a long time (or ever) just because of upstream time constraints or lack of interest. Sometimes packages are unmaintained or “semi-maintained” at best.

        While packaging all of this in Debian is certainly doable and hardly an insurmountable effort, it also strikes me as quite a lot of effort with very little tangible benefits. It’s no surprise that the packager kind of gave up on it (for the time being anyway).

        Debian’s biggest issue IMHO is that they have a “one-size-fits-all” approach based on how people worked with C/C++ in the 80s and 90s, and this doesn’t really map all that well to pretty much any environment developed since. This isn’t really a new “2020s problem”, because people have been side-stepping package systems like Debian for over a decade, and people have been complaining about it for longer.

        1.  

          based on how people worked with C/C++ in the 80s and 90s

          Reusing any C code back in the 80s and 90s was horrendously difficult. Maybe part of the problem is that the bar for how much pain it is assumed to be okay to dump on people who want to use libraries was set in the bad old days? and it hasn’t been recalibrated.

          Edit: I think danieldk agrees with me that reusing code got way easier but has a completely different opinion of the value of that. :) https://lobste.rs/s/1exdkb/debian_discusses_vendoring_again#c_05vgtp

        1. 5

          Debian has to figure out how to co-operate with other package managers. By painstakingly repackaging every npm dependency they’re not adding any value. From perspective of an npm user it is very weird. The Debian versions are installed in places that npm doesn’t recognize, so I can’t use them even if I wanted to.

          1.  

            On the bright side, the places where they are packaged are out of your way so they won’t unpredictably break things for you.

            1. 0

              Debian has no problems co-operating with, say, pip. Python packages installed using Debian package manager is visible to pip. Perhaps it’s npm that should figure out how to co-operate with system package managers, not Debian.

              1. 5

                Debian has no problems co-operating with, say, pip.

                Yes it does.

                I’ve seen tons of Python projects that basically start their setup instructions with “create a virtualenv and use pip to install the dependencies.” The whole point of virtualenv is to avoid the distribution-packaged pip libraries in favour of whatever the latest one is. Projects that do get redistributed by Debian, like Mercurial, have to go out of their way to avoid pulling in pip dependencies.

                https://gregoryszorc.com/blog/2020/01/13/mercurial's-journey-to-and-reflections-on-python-3/

                In April 2016, the mercurial.pycompat module was introduced to export aliases or wrappers around standard library functionality to abstract the differences between Python versions. This file grew over time and eventually became Mercurial’s version of six. To be honest, I’m not sure if we should have used six from the beginning. six probably would have saved some work. But we had to eventually write a lot of shims for converting between str and bytes and would have needed to invent a pycompat layer in some form anyway. So I’m not sure six would have saved enough effort to justify the baggage of integrating a 3rd party package into Mercurial. (When Mercurial accepts a 3rd party package, downstream packagers like Debian get all hot and bothered and end up making questionable patches to our source code. So we prefer to minimize the surface area for problems by minimizing dependencies on 3rd party packages.)

                1. 1

                  I mean, yes. kornel doesn’t like Debian because Debian packages are not visible to npm. notriddle doesn’t like Debian because upstreams don’t want Debian packages to be visible to pip. Two are in contradiction. This is not a solvable problem.

                  1. 3

                    It’s not a contradiction. Debian is just failing both types of users by doing a half-assed thing. It’s neither leaving packages alone, nor providing a set large and compatible enough to be useful.

            2. 3

              Whenever “what package management ought to look like” comes up, I think my fundamental objections comes to this:

              How does the system handle theming?

              It’s super common for there to be a theme manager that has a list of user-contributed themes on a website. It’s also possible for literal thousands of themes to exist for a single program.

              So, a few questions:

              • Should themes be handled by the system’s package manager, or by some sort of parallel package manager that’s specific to themes? After all, some programs’ themes are entirely declarative, trivially backward compatible and all-around extremely simple.
              • If you’re running a program that breaks the theming system regularly and requires the theme be updated often, then what should handle the updating system if not the system’s package manager itself?
              • If you’re using the system’s package manager, should there be some sort of “package subset” for extremely simple ‘packages’ that will never need to update, run code, etc?
              • If you want to reinstall your OS on a different computer, how will installing your theming be handled? (saying “you manually track down and re-pick each theme for each program” is not a desirable response)
              • Will the package manager be responsive with randomly trying a ton of potentially-less-than-one-kilobyte files?
              • How do you handle user submissions when they’re actually submitting official packages? I mean, there’s zero reason to wait 2 years for a theme to be available in your distro’s LTS.

              In practice, the answer to all of this is either 1) theming is a website, you download from, it’s annoyingly manual and not-streamlined and won’t gracefully handle automated installations, or 2) a theme is a normal package and there are perhaps 3 themes, instead of hundreds, because submitting an actual package is a lot more overhead than submitting a webform (or even a “share this theme online” button in the program’s theme settings menu).

              1. 3

                I think this is a case where the traditional packaging model falls apart. First, because the model makes it hard to programmatically generate package definitions, secondly because having multiple versions of a package is typically conflicting (unless you make versions part of the package name, rather than version).

                This is less of a problem for functional package managers, because their use of turing-complete languages (Nix, Scheme) make they it easier to generate package definitions programmatically. Secondly, since they permit storing multiple versions of a package in parallel, many versions can be installed in parallel without any conflicts.

                See e.g. crate2nix (which generates Nix from Cargo.toml/Cargo.lock files), bundix, yarn2nix, etc.

                1. 4

                  This is a solution. But a shallow one frankly.

                  In traditional distribution management security issues are solved by fixing one part of the puzzle. With the hermetic ones you are patching each and every dependant package induvidually. And who knows which version they use? Rust publishes security advisories and deals with guidance to project maintainers. But who else does this? And do maintainers follow them to actively update their project dependencies?

                  Would a project publish patch releases just to fix security issues in dependant packages?

                  Nix/Guix/Docker and other hermetic build systems pushes the patching down to the respective upstreams because the tools to bridge this is not available. You can’t programatically inspect the vendored dependencies and figure this out. The tooling doesn’t exist.

                  I don’t think it solves anything.

                  1. 2

                    I don’t think it solves anything.

                    First, it solves the issue that you can manager these programs using your system package manager, rather than relying on curl | bash, Docker containers, etc. As a result, it provides better integration (e.g. management of services through systemd units, etc.).

                    Also, similar to how you can patch dependencies for security issues in a traditional package manager, you can apply patches to specific versions. Most functional package sets provide override mechanisms. These are already used to specify e.g. for Rust openssl crate derivation to provide the native OpenSSL library as a dependency (since this information is not provided by the Cargo metadata). This mechanism could be extended easily to override certain packages or package versions to apply an additional patch.

                    1. 2

                      First, it solves the issue that you can manager these programs using your system package manager, rather than relying on curl | bash, Docker containers, etc. As a result, it provides better integration (e.g. management of services through systemd units, etc.).

                      That is the job for any package manager though. There is nothing inherently special here.

                      Also, similar to how you can patch dependencies in a traditional package manager, you can apply patches to specific versions. Most functional package sets provide override mechanisms. These are already used to specify e.g. for Rust crate openssl provide the native OpenSSL library as a dependency. This mechanism could be extended easily to override certain packages or package versions to apply an additional patch.

                      I think you are missing the point though. The problem is devendoring the project and/or keep track of the packages. Along with patching them. Remember, there are multiple copies of the library on your system. Openssl is a fairly simple case since it’s fairly trivial to provide multiple versions on a system level. What is not easy is to keep track of 3 different versions of a library inside node_modules and ensure it’s patched.

                      Same applies for Go. Here is containerd depending on 4-5 different versions of runc(!)

                      https://github.com/containerd/containerd/blob/master/go.sum#L403-L408

                      1. 1

                        Same applies for Go. Here is containerd depending on 4-5 different versions of runc(!) https://github.com/containerd/containerd/blob/master/go.sum#L403-L408

                        containerd depends on a single version of runc. You can also see there’s a single copy in the vendor folder.

                        go.sum contains information other than the modules currently in-use:

                        Is ‘go.sum’ a lock file? Why does ‘go.sum’ include information for module versions I am no longer using?

                        No, go.sum is not a lock file. The go.mod files in a build provide enough information for 100% reproducible builds.

                        For validation purposes, go.sum contains the expected cryptographic checksums of the content of specific module versions. See the FAQ below for more details on go.sum (including why you typically should check in go.sum) as well as the “Module downloading and verification” section in the tip documentation.

                        In part because go.sum is not a lock file, it retains cryptographic checksums for module versions even after you stop using a module or particular module version. This allows validation of the checksums if you later resume using something, which provides additional safety.

                        In addition, your module’s go.sum records checksums for all direct and indirect dependencies used in a build (and hence your go.sum will frequently have more modules listed than your go.mod).

                        (see the Go modules FAQ)

                  2. 4

                    This is just sweeping the problem under the rug IMHO. I personally believe that the root problem is bad backward incompatibility and crappy semantic versioning of software libraries. To this, add application developers who don’t try understand the library and uses private features which the library developer didn’t mean to expose through the API.

                    You end up with software requiring lib.2.3.5-beta.so and which will break with anything else.

                    Some projects have an obsession with backward compatibility, you rarely hear about an application having a “maximum linux kernel version” for example.

                    I guess software engineering is not a craft, and never was…

                    1. 5

                      I personally believe that the root problem is bad backward incompatibility and crappy semantic versioning of software libraries.

                      That is not the origin of the problem that the article states, which is that a lot of software nowadays rely on, or vendor, a large number (hundreds or thousands) of dependencies. I think this has little to do with semantic versioning, and more to do with that language-specific package managers make the cost of adding dependencies so low for developers.

                      Since C and C++ did not provide any standard package managers, Linux distributions effectively became the de facto C and C++ package managers. When you develop a C/C++ application, you are constrained to a common set of libraries provided by the major Linux distributions, or you or your users end up manually managing dependencies. Consequently, Linux distributions were a fairly strong incentive to use a smaller set of widely-used dependencies, and conservatively developing against older versions (for Debian Stable, Ubuntu LTS, and RHEL compatibility).

                      But now that there are language-specific package managers with package repositories where anyone can upload libraries (crates.io, PyPI, etc.), the traditional constraints on the number of dependencies and maximum version of dependencies is lifted. As a consequence, people just add more dependencies and more recent dependencies (e.g. many Rust crates do not compile with a one-year old Rust compiler).

                      Distribution cannot keep up with packaging all these dependencies in many incompatible versions, so other mechanisms for distribution are taking over, such as a single static binary, a Docker image, Flatpak, AppImage, or Snap. I wouldn’t be surprised if the role of mainstream distributions is reduced in a few years to: 1.) a lean OS to build container images; 2.) a lean server OS to run containers; 3.) a lean desktop OS where all the applications are distributed as Flatpaks.

                      I am not convinced this is better, but it is a transition that has happened over the past decade or so, and I think that ship has sailed.

                      To this, add application developers who don’t try understand the library and uses private features which the library developer didn’t mean to expose through the API.

                      Unless the downstream developers are forking the library, this seems more like a library design issues. Do not expose APIs that you do not want downstream users to use. Even though such vendoring + patching (effectively creating a fork) happens in some projects, this is AFAIK not the norm for e.g. vendoring in Go.

                      1. 1

                        many Rust crates do not compile with a one-year old Rust compiler

                        Ironically, crates may require newer Rust versions, because they want to use functionality recently added to the standard library. So it’s the opposite case — avoiding dependencies causes churn.