This was quite an enjoyable read, even though I don’t really use Nix! I liked the author’s over-the-top writing style especially, I wish there were more articles written like this.
I generally agree… but maybe the large AI-generated-looking pictures we could do without.
(Yes, now we can easily have impressive-looking illustrations for our blog posts. But do we actually need them? What value do they bring? I wonder if people actually look at them instead of skipping over them.)
The traditional answer to this is that glibc does a better job maintaining backwards compatibility across the Linux syscall interface than the kernel does. My sense is that the traditional answer is no longer quite true, and just a bit of calcified anti-kernel hacker prejudice.
The problem on Linux is that things like NSS won’t really work with static linking. We’d need a service for NSS instead of every program dynamically loading the modules. In practice many NSS modules talk to a daemon anyway, for caching (like sssd or nslcd).
I like to say that “extensibility in the same address space” is usually more trouble than it’s worth. See for example Apple using their XPC for many interfaces, decoupling the address spaces and making the architecture transition much easier.
The systemd document linked from there is fascinating. The name resolution uses VARLINK, which I also hadn’t previously heard of, but which looks like a very simple JSON RPC system with some UNIX integration. It lists a number of reasons why this can’t use DBUS, but I found the startup one less than compelling: name resolution won’t work until you’ve started the daemon that handles the varlink connection, why can’t you guarantee that the DBUS daemon starts this early?
I think I ran into this very error after setting up a new NixOS installation and copying over a “mostly statically” linked library compiled on another NixOS machine. I think I know how to fix this properly now, but it’s rather shitty how NixOS manages to break even the most basic stuff.
NixOS is really not meant to work when you just “copy over another binary” whether statically linked or not. It’s less that it’s breaking basic stuff and more like it intentionally made choices to meet goals that the standard platforms essentially make impossible. And those goals are both highly valuable and also non-trivial to achieve.
The tradeoff is that installing anything on a NixOS installation should be done via Nix and not anything else.
If you want the best of both worlds though then you can use nix on an more traditional os. The nix stuff will just work. The copied binaries will sometimes work and sometimes not. Depending on where and how the were compiled and the current state of glibc on your OS. This may be considered an improvement by you. It may not.
It might be helpful to think of NixOS as not a typical Linux distribution but a related but different operating system, like a BSD or Android (another Linux distro unlike typical desktop distros).
I’ve encountered that mystifying “no such file or directory” exec error many times in my career, even long before Nix. Another frequent case is when you’re building a rootfs (e.g. a container image or embedded rootfs) and forgot to put ld.so in the right spot.
I think what makes it so much worse is the abysmal error reporting - why the hell do we just get ENOENT for both cases instead of a more specific “hey the P_INTERP is missing”? We have an errno for “no anode” but we can’t afford one for this, really?
I wonder if this is an artifact of the fact that ELF was not introduced into Linux until 2.0. I have occasionally noticed that earlier versions of the kernel had poor error handling for ill-formed ELF executables when it came to features that didn’t exist in a.out.
This was quite an enjoyable read, even though I don’t really use Nix! I liked the author’s over-the-top writing style especially, I wish there were more articles written like this.
I generally agree… but maybe the large AI-generated-looking pictures we could do without.
(Yes, now we can easily have impressive-looking illustrations for our blog posts. But do we actually need them? What value do they bring? I wonder if people actually look at them instead of skipping over them.)
I liked them and looked at them. They were pretty and felt cohesive to me.
Was this ever true?
A great argument for periodically revisiting legacy design choices. Static linking should be the default in modern compilers.
The problem on Linux is that things like NSS won’t really work with static linking. We’d need a service for NSS instead of every program dynamically loading the modules. In practice many NSS modules talk to a daemon anyway, for caching (like sssd or nslcd).
AFAICT systemd can provide such a service, because there’s a Go issue about using it.
I like to say that “extensibility in the same address space” is usually more trouble than it’s worth. See for example Apple using their XPC for many interfaces, decoupling the address spaces and making the architecture transition much easier.
The systemd document linked from there is fascinating. The name resolution uses VARLINK, which I also hadn’t previously heard of, but which looks like a very simple JSON RPC system with some UNIX integration. It lists a number of reasons why this can’t use DBUS, but I found the startup one less than compelling: name resolution won’t work until you’ve started the daemon that handles the varlink connection, why can’t you guarantee that the DBUS daemon starts this early?
I think I ran into this very error after setting up a new NixOS installation and copying over a “mostly statically” linked library compiled on another NixOS machine. I think I know how to fix this properly now, but it’s rather shitty how NixOS manages to break even the most basic stuff.
NixOS is really not meant to work when you just “copy over another binary” whether statically linked or not. It’s less that it’s breaking basic stuff and more like it intentionally made choices to meet goals that the standard platforms essentially make impossible. And those goals are both highly valuable and also non-trivial to achieve.
The tradeoff is that installing anything on a NixOS installation should be done via Nix and not anything else.
If you want the best of both worlds though then you can use nix on an more traditional os. The nix stuff will just work. The copied binaries will sometimes work and sometimes not. Depending on where and how the were compiled and the current state of glibc on your OS. This may be considered an improvement by you. It may not.
It might be helpful to think of NixOS as not a typical Linux distribution but a related but different operating system, like a BSD or Android (another Linux distro unlike typical desktop distros).
I’ve encountered that mystifying “no such file or directory” exec error many times in my career, even long before Nix. Another frequent case is when you’re building a rootfs (e.g. a container image or embedded rootfs) and forgot to put ld.so in the right spot.
I think what makes it so much worse is the abysmal error reporting - why the hell do we just get ENOENT for both cases instead of a more specific “hey the P_INTERP is missing”? We have an errno for “no anode” but we can’t afford one for this, really?
I wonder if this is an artifact of the fact that ELF was not introduced into Linux until 2.0. I have occasionally noticed that earlier versions of the kernel had poor error handling for ill-formed ELF executables when it came to features that didn’t exist in a.out.