It is disingenuous to consider NixOS activation scripts as “hooks and triggers”; activation scripts are only for NixOS, not Nix. What this means is that, yes, while NixOS needs to update /etc and systemd and other parts of running system state, Nix packages themselves are just files on disk without any post-install steps.
I mention this because the author wants to consider Nix to not fulfill their criteria, even though it clearly does:
Distribution-scoped: Nix can manage an entire Linux distro, ala NixOS
Uses images: Nix packages can be flat tarballs
Omits hooks and triggers: Plain Nix packages don’t do anything when copied to the Nix store
When we compare Nix to distri, it’s suddenly not obvious why distri might be preferable. The main limitation is that distri doesn’t have its own DSL for configuring packages, but simply stores textual representations of compressed build actions. Because of this, it’s difficult to actually write packages in distri, which leads to package-building logic being inlined into the low-level Go parts of distri instead. For example, distri hardcodes Docker, initrd, and other concerns into Go all at once, while these things are factored into separate nixpkgs modules for Docker images and initrd building. It’s about the same amount of code, but Nix doesn’t need to be rebuilt in order to use a new nixpkgs.
I might be able to take the author’s speed concerns more seriously if I weren’t typically on wireless connections. It sounds nice to saturate a 1Gbps link, but I don’t have one of those. Nix folks are not blind to the idea that a package might simply be copied from one place to another, and using nix-copy-closure I can saturate my wireless connection; therefore there’s not actually any speed gains available to me just by switching package managers.
And finally, if the core of the problem is that Nix’s current reference interpreter is not concurrent enough, then let’s implement a new one, or patch the existing C++ interpreter to have better concurrency. However, this claim itself would need evidence, since Nix already has concurrent downloads, concurrent builds, concurrent garbage collection, and concurrent connections to the backend build daemon.
As per the current landscape, there is no distribution-scoped package manager which uses images and leaves out hooks and triggers, not even in smaller Linux distributions.
While that’s true as far as I know, I think Fedora Silverblue comes pretty close by using OStree and Flatpak.
EDIT: And resinstack.io comes to mind, saw it recently here on lobste.rs. I guess many other projects based on linuxkit could qualify as well, as they end up using oci container images to ship (sub)trees of Linux filesystems in a somewhat comparable fashion to flatpak
While that’s true as far as I know, I think Fedora Silverblue comes pretty close by using OStree and Flatpak.
I agree, but the OSTree part is fairly limited in that it is not really a package manager, but just an OS image that consists of layered snapshots (as far as I understand). rpm-ostree adds some flexibility, but as far as I understand it is basically performing RPM installs and then creating OSTree snapshots of that. So, it is an impure, imperative traditional package manager shoehorned into the OSTree world. Of course, it does bring many benefits (atomic updates/rollbacks).
I think for Red Hat’s vision where every desktop application is a Flatpak and all development and deployments happen in containers, this is a great approach. But it is fairly limited for folks who want to have the flexibility of a traditional Linux distribution, but with a non-global namespace (ability to install several versions/configurations of a package side-by-side), atomic upgrades/rollbacks, immutable system, etc.
Here IMO, Nix, Guix, and Michael’s work are much more promising.
(Sorry @phaer for piggy-backing on your comment, what you say is absolutely correct.)
No worries, @danieldk. Thank’s for your input, I do agree regarding the limitations of OSTree. It’s still interesting to me as its backing by RedHat makes it much more likely to be deployable in enterprise contexts. In smaller, more hacker friendly environments, Nix, Guix and distri might be more promising :)
It’s still interesting to me as it’s backing by RedHat makes it much more likely to be deployable in enterprise contexts.
I agree. Regardless of whether there are more revolutionary approaches, what Red Hat is doing is a big step forward, and it is nice to see that one of the big players is exploring this space.
From experience, Alpine’s apk is way faster than this. Usually, on a datacenter box with specs way worse than those listed in this article, it finishes installing before I can release the enter key. I can confirm other package managers being that slow, though. I imagine it has something to do with dependency management: Alpine can skip a lot of it thanks to statically linked binaries.
One of the results shown is taking 5s to intall a 15M package. Does that include the download time? Or was the package already on the local drive? It doesn’t say, so I’m inclined to think the author is complaining more about a slow internet connection than anything else (and makes me want to force the author to use a computer from 1992).
Also it is quite important to check how many dependencies are fetched. If base OS contains all deps already, then it will be obvious that it will fetch and install faster than other manager that need to fetch all deps. This partially explains Alpine results as it build packages statically in most cases.
If you look in appendix B, the author lists the output from each command several of which include how long it takes to download the packages. It doesn’t seem like internet speeds are an issue.
I have read the article. I copied that straight from it. Scroll down to Appendix B. Under the qemu header, click on the arrow next to “Debian’s apt takes 51 seconds to fetch and unpack 159 MB.” to expand that section and see the author’s run of the command:
% docker run -t -i debian:sid
root@b7cc25a927ab:/# time (apt update && apt install -y qemu-system-x86)
Get:1 http://cdn-fastly.deb.debian.org/debian sid InRelease [149 kB]
Get:2 http://cdn-fastly.deb.debian.org/debian sid/main amd64 Packages [8426 kB]
Fetched 8574 kB in 1s (6716 kB/s)
[…]
Fetched 151 MB in 2s (64.6 MB/s)
[…]
real 0m51.583s
user 0m15.671s
sys 0m3.732s
I guess that’s because most of them were created when HDDs were a thing, and CDs were a legitimate option for installation source. In such setups it didn’t make sense parallelize anything, since it’d only cause slowdowns from seeks and fragmentation.
I hope this ignites some competition to speed everything up.
There is a small reasoning mistake in the article: not all of the hooks are executed on a per-package basis. For example, the NixOS activation script is executed once per system change, regardless of how many new packages have been installed. It doesn’t mean that Nix is fast, but it is able to benefit from some more concurrency.
That being said, pacman’s performance really degrades with a lot of packages. I have 2711 installed right now (yes, I know…). Doing a simple metadata dump of a locally installed package takes 15s with cold caches on my system
$ time pacman -Qi gcc
Name : gcc
Version : 10.2.0-2
Description : The GNU Compiler Collection - C and C++ frontends
Architecture : x86_64
URL : https://gcc.gnu.org
Licenses : GPL LGPL FDL custom
Groups : base-devel
Provides : gcc-multilib
Depends On : gcc-libs=10.2.0-2 binutils>=2.28 libmpc
Optional Deps : lib32-gcc-libs: for generating code for 32-bit ABI [installed]
Required By : clang dmd gcc-ada gcc-fortran ghc ldc open-riichi-git vala
Optional For : afl dmd xorg-xrdb
Conflicts With : None
Replaces : gcc-multilib
Installed Size : 147.32 MiB
Packager : Bartłomiej Piotrowski <bpiotrowski@archlinux.org>
Build Date : Tue 01 Sep 2020 12:08:31 PM EDT
Install Date : Fri 11 Sep 2020 02:55:55 PM EDT
Install Reason : Explicitly installed
Install Script : No
Validated By : Signature
real 0m14.604s
user 0m0.032s
sys 0m0.068s
The amount of metadata seems excessive for the use case of installing a single package, which I consider the main use-case of an interactive package manager
In docker world, sure, installing something from scratch with no metadata cache is common. On long-running desktop or server systems, package repo metadata remains cached for days!
Maybe a package manager that doesn’t have a local database of the whole remote repo and always queries the server on demand would be good, in docker world especially. But running the smart server would be harder on the sysadmins vs. the build system just dumping a meta.txz onto a standard file server.
I’ve switched to FreeBSD from Debian, and I haven’t measured, but pkg feels a bit faster than apt.
Kind of a “meh” issue for me, though. If somebody makes it faster I won’t complain, but even on Debian it seemed fast enough. Growing up with a 14.4k modem helped, I guess.
I found that it improves a lot with more powerful hardware, like SSD vs. spinning platters. What I’d like to see from Portage is some use of caching. e.g. if I run an emerge [...] command, and it takes N minutes before beginning compilation, I would like it if running the exact same emerge command soon after would not take the full N minutes. I’m okay if the cache is not invalidated correctly on occasion. I’m happy if, once in a while, I have to run something manually to invalidate the cache if it means I can get some speed gains most of the time.
dnf is really slow by default, for some reason. After looking around on various forums, I managed to identify a few options that speed it up. Currently, my /etc/dnf/dnf.conf looks like this:
Isn’t this just basically measuring network speed? Even if you have a good Gbit connection surely you’re mostly measuring how good the mirrors/CDN are. In that sense it would make more sense to actually measure with a slow internet connection to even out the playing ground.
I haven’t tried them all, but apt seems to spend significant time “Reading package lists” from disk, even on an SSD.
When it gets index updates, it seems to read them one after another. These checks are mainly limited by latency, not bandwidth, so they’d benefit from being parallelized.
There are many other variables likes the number of not installed dependencies that need to be downloaded and installed and the repository metadata size.
I don’t think getting rid of the mirror variable by hosting your own mirror for all the package managers will significantly change the results of the test.
The testcase of “fetch + install” is largely in favor of apk because its the only package manager which fetches and unpacks the archive at the same time. All other package managers will write packages into the cache and after everything is downloaded they will start to read the archives again to unpack them.
So you end up with apk being bottle necked by the download speed and disk write speed, while all other package managers (I don’t know enough about nix so I’m excluding it) additionally are bottle necked again by the disk read and write speed (ignoring the page cache, which could still have the package files written into the cache in memory).
Changing the test case to use the local cache instead of fetching the packages will probably even out apk, pacman and nixos, but apt and dnf are still going to be a bit slower because of their “less minimal” design and repository metadata size.
It is disingenuous to consider NixOS activation scripts as “hooks and triggers”; activation scripts are only for NixOS, not Nix. What this means is that, yes, while NixOS needs to update
/etc
and systemd and other parts of running system state, Nix packages themselves are just files on disk without any post-install steps.I mention this because the author wants to consider Nix to not fulfill their criteria, even though it clearly does:
When we compare Nix to distri, it’s suddenly not obvious why distri might be preferable. The main limitation is that distri doesn’t have its own DSL for configuring packages, but simply stores textual representations of compressed build actions. Because of this, it’s difficult to actually write packages in distri, which leads to package-building logic being inlined into the low-level Go parts of distri instead. For example, distri hardcodes Docker, initrd, and other concerns into Go all at once, while these things are factored into separate nixpkgs modules for Docker images and initrd building. It’s about the same amount of code, but Nix doesn’t need to be rebuilt in order to use a new nixpkgs.
I might be able to take the author’s speed concerns more seriously if I weren’t typically on wireless connections. It sounds nice to saturate a 1Gbps link, but I don’t have one of those. Nix folks are not blind to the idea that a package might simply be copied from one place to another, and using
nix-copy-closure
I can saturate my wireless connection; therefore there’s not actually any speed gains available to me just by switching package managers.And finally, if the core of the problem is that Nix’s current reference interpreter is not concurrent enough, then let’s implement a new one, or patch the existing C++ interpreter to have better concurrency. However, this claim itself would need evidence, since Nix already has concurrent downloads, concurrent builds, concurrent garbage collection, and concurrent connections to the backend build daemon.
While that’s true as far as I know, I think Fedora Silverblue comes pretty close by using OStree and Flatpak.
EDIT: And resinstack.io comes to mind, saw it recently here on lobste.rs. I guess many other projects based on linuxkit could qualify as well, as they end up using oci container images to ship (sub)trees of Linux filesystems in a somewhat comparable fashion to flatpak
I agree, but the OSTree part is fairly limited in that it is not really a package manager, but just an OS image that consists of layered snapshots (as far as I understand).
rpm-ostree
adds some flexibility, but as far as I understand it is basically performing RPM installs and then creating OSTree snapshots of that. So, it is an impure, imperative traditional package manager shoehorned into the OSTree world. Of course, it does bring many benefits (atomic updates/rollbacks).I think for Red Hat’s vision where every desktop application is a Flatpak and all development and deployments happen in containers, this is a great approach. But it is fairly limited for folks who want to have the flexibility of a traditional Linux distribution, but with a non-global namespace (ability to install several versions/configurations of a package side-by-side), atomic upgrades/rollbacks, immutable system, etc.
Here IMO, Nix, Guix, and Michael’s work are much more promising.
(Sorry @phaer for piggy-backing on your comment, what you say is absolutely correct.)
No worries, @danieldk. Thank’s for your input, I do agree regarding the limitations of OSTree. It’s still interesting to me as its backing by RedHat makes it much more likely to be deployable in enterprise contexts. In smaller, more hacker friendly environments, Nix, Guix and distri might be more promising :)
I agree. Regardless of whether there are more revolutionary approaches, what Red Hat is doing is a big step forward, and it is nice to see that one of the big players is exploring this space.
From experience, Alpine’s
apk
is way faster than this. Usually, on a datacenter box with specs way worse than those listed in this article, it finishes installing before I can release the enter key. I can confirm other package managers being that slow, though. I imagine it has something to do with dependency management: Alpine can skip a lot of it thanks to statically linked binaries.One of the results shown is taking 5s to intall a 15M package. Does that include the download time? Or was the package already on the local drive? It doesn’t say, so I’m inclined to think the author is complaining more about a slow internet connection than anything else (and makes me want to force the author to use a computer from 1992).
Also it is quite important to check how many dependencies are fetched. If base OS contains all deps already, then it will be obvious that it will fetch and install faster than other manager that need to fetch all deps. This partially explains Alpine results as it build packages statically in most cases.
If you look in appendix B, the author lists the output from each command several of which include how long it takes to download the packages. It doesn’t seem like internet speeds are an issue.
The author says “fetch and unpack” (emphasis mine). There is no place where download time is separated out.
A few of the package managers themselves report download speeds for example
apt
reports it took 2s to download metadata updates before installingqemu
dnf
andpacman
also report download speeds for their metadata updates.Have you read the article? Because that information is not reported.
I have read the article. I copied that straight from it. Scroll down to Appendix B. Under the qemu header, click on the arrow next to “Debian’s apt takes 51 seconds to fetch and unpack 159 MB.” to expand that section and see the author’s run of the command:
You are correct. I did not know you can expand the results.
I guess that’s because most of them were created when HDDs were a thing, and CDs were a legitimate option for installation source. In such setups it didn’t make sense parallelize anything, since it’d only cause slowdowns from seeks and fragmentation.
I hope this ignites some competition to speed everything up.
There is a small reasoning mistake in the article: not all of the hooks are executed on a per-package basis. For example, the NixOS activation script is executed once per system change, regardless of how many new packages have been installed. It doesn’t mean that Nix is fast, but it is able to benefit from some more concurrency.
dpkg also aggregates execution of hooks: see https://wiki.debian.org/DpkgTriggers.
thanks for the correction!
Just a note, pacman now uses zstd.
That being said, pacman’s performance really degrades with a lot of packages. I have 2711 installed right now (yes, I know…). Doing a simple metadata dump of a locally installed package takes 15s with cold caches on my system
In docker world, sure, installing something from scratch with no metadata cache is common. On long-running desktop or server systems, package repo metadata remains cached for days!
Maybe a package manager that doesn’t have a local database of the whole remote repo and always queries the server on demand would be good, in docker world especially. But running the smart server would be harder on the sysadmins vs. the build system just dumping a
meta.txz
onto a standard file server.I’ve switched to FreeBSD from Debian, and I haven’t measured, but pkg feels a bit faster than apt.
Kind of a “meh” issue for me, though. If somebody makes it faster I won’t complain, but even on Debian it seemed fast enough. Growing up with a 14.4k modem helped, I guess.
I also find Gentoo’s
portage
super slow. Not talking about compile times here, but just calculating dependencies and stuff to update/install.I found that it improves a lot with more powerful hardware, like SSD vs. spinning platters. What I’d like to see from Portage is some use of caching. e.g. if I run an
emerge [...]
command, and it takes N minutes before beginning compilation, I would like it if running the exact sameemerge
command soon after would not take the full N minutes. I’m okay if the cache is not invalidated correctly on occasion. I’m happy if, once in a while, I have to run something manually to invalidate the cache if it means I can get some speed gains most of the time.It always takes multiple minutes for me. On an i7, with a 3GBps SSD. I find that to be long.
dnf is really slow by default, for some reason. After looking around on various forums, I managed to identify a few options that speed it up. Currently, my
/etc/dnf/dnf.conf
looks like this:and it’s not apk-fast, but totally fine.
Another package manager I would have like to have seen in this comparison would be Void’s XBPS, that I have in the past found to be speedy.
Isn’t this just basically measuring network speed? Even if you have a good Gbit connection surely you’re mostly measuring how good the mirrors/CDN are. In that sense it would make more sense to actually measure with a slow internet connection to even out the playing ground.
I haven’t tried them all, but
apt
seems to spend significant time “Reading package lists” from disk, even on an SSD.When it gets index updates, it seems to read them one after another. These checks are mainly limited by latency, not bandwidth, so they’d benefit from being parallelized.
And the entire installation process is serial.
There are many other variables likes the number of not installed dependencies that need to be downloaded and installed and the repository metadata size. I don’t think getting rid of the mirror variable by hosting your own mirror for all the package managers will significantly change the results of the test.
The testcase of “fetch + install” is largely in favor of apk because its the only package manager which fetches and unpacks the archive at the same time. All other package managers will write packages into the cache and after everything is downloaded they will start to read the archives again to unpack them. So you end up with apk being bottle necked by the download speed and disk write speed, while all other package managers (I don’t know enough about nix so I’m excluding it) additionally are bottle necked again by the disk read and write speed (ignoring the page cache, which could still have the package files written into the cache in memory).
Changing the test case to use the local cache instead of fetching the packages will probably even out apk, pacman and nixos, but apt and dnf are still going to be a bit slower because of their “less minimal” design and repository metadata size.