OpenBSD requires syscalls to go through libc. I would assume if you statically link libc then you need to recompile after kernel updates, though I’m finding it difficult to get a definitive answer with a very quick search.
I don’t think it’s literally ABI changes (at least, almost never), but the backward compatibility guarantee is at the libc layer not the syscall layer. In other words, it’s considered OK to change syscall behavior if you put backward compatibility logic in libc for it. In any case, it’s not just a “guideline”, which is why Go switched to using libc rather than direct syscalls on BSD/MacOS.
The article describes linking against libc.a, a static library. That means if the syscall behavior changes, the backward compatibility logic in libc will not affect already-built executables. So those executables you have lying around will now have undefined behavior, unless the syscall ABI is stable.
Yes, AFAIK in BSD world you shouldn’t static link libc unless your binary is part of the main tree, because the syscalls aren’t stable across major versions. There are “version symbols” to make the shared library work across versions, but no way to do that with static linking. So, definitely not a “guideline”.
It depends on which BSD. They have different stability and compatibility guarantees and different technical mechanisms to implement those guarantees. FreeBSD, OpenBSD, and macOS are as different from each other as they are from Linux.
Oh, interesting — I haven’t paid attention to OpenBSD for a while. So yeah, it’s just FreeBSD and MacOS where I’ve run into this. (And, I believe, Solaris, but that was long ago.)
So given all those BSDs and Windows, I had filed the “syscalls never break” rule as strictly a Linux thing.
So yeah, it’s just FreeBSD and MacOS where I’ve run into this
FreeBSD has strong syscall compatibility guarantees. If a system call changes in an incompatible way, a new one is added and the old one renamed. The old one is gated behind a COMPAT_{version number} compile option in the kernel. The official kernel builds include everything from COMPAT_4 onwards (I think, possibly older) but if you’re building an appliance and know it will run only newer code then you can remove it.
Similarly, FreeBSD’s libc uses symbol versioning so that things that used old versions of system calls whose ABIs have changed (e.g. new versions of stat with newer fields) will call the syscall wrapper that calls the compat version, not the new one.
On recent FreeBSD, the system call wrappers are in libsyscalls, which libc links, so you can provide you own implementation of the system call layer (for example, in a sandboxed environment) but use all of the libc machinery.
macOS is completely different. libSystem is the stable system-call interface. If you want to talk to the kernel, you go via libSystem or you expect breakage. Apple reserves the right to change the system-call layer between minor revisions. A few years ago, they changed the signature of gettimeofday, which broke all Go programs on macOS (because Go implemented its own system-call layer, rather than using the supported one).
Thank you for clearing that up! Of course FreeBSD has a sensible way to handle it, no surprise there. Though I’m now baffled as to why I thought this happened on FreeBSD. Maybe we were using a kernel that someone had “helpfully” changed the compat setting on to save space or something.
Non-syscall ABIs can change across major revisions. For example, device ioctls may change (though FreeBSD ioctl numbers include the size, so usually there’s a new one added rather than an old one changing). This used to happen more but now the project uses old jails on new systems for package builds. This means that a lot of things end up with compatibility interfaces.
I don’t know about OpenBSD, but macOS and Windows both also have an equivalent policy (libSystem.dylib and ntdll.dll, respectively) and both have syscall numbers that are fairly unstable.
My understanding is that, for OSes other than Linux, it’s a common decision to treat the kernel and libc as two pieces of the same codebase which just happen to live on opposite sides of a kernelspace/userspace split, but still share an enum and care more about keeping the enum members alphabetized than about keeping their raw integer representations stable.
I hate to be rude but… is this a surprise to people now? Do programmers not know what linkers do? Maybe we should explain it in terms of web frontend. “A linker is kind of webpack’s tree shaking…”
(The reality is static linking is not nearly as common as it used to be. Too bad, I have 25+ year old Linux binaries I can still run thanks to it.)
Yes it’s a surprise, because doing the pruning of unused stuff on the level of object files instead of exported symbols is pretty unexpected and hacky. Why would you do it that way instead of doing proper tree-shaking? Because it was easy 50 years ago and has never been improved, that’s why. And doing gcc -static hello.c on my Linux system gets me a 700 kb executable containing, according to objdump, over 2000 symbols.
I blame glibc, but it might be neat to find out for real.
Traditionally, static linking does do proper tree shaking … tho it requires the programmers to write minimal source files, as little as one function per file. More modern toolchains have -ffunction-sections which achieves the same granularity with less manualarity.
The problem is complex interdependencies. Calling printf() implies pulling in a shitton of code via transitive function calls. (Because the stupid format string is interpreted at runtime instead of being compiled and tractable to dead code elimination.)
Calling printf() implies pulling in a shitton of code via transitive function calls. (Because the stupid format string is interpreted at runtime instead of being compiled and tractable to dead code elimination.)
printf calls a lot of things. It doesn’t just format the string, it formats the string in the current locale. This means it calls localeconv, and all of the number-formatting things. These, in turn, call all of the locale-related machinery, which brings in a load more things. You probably need a good 20% of libc for printf.
In a modern (post POSIX2008) libc, printf is a wrapper around vfprintf_l, which takes an explicit locale. If you call the _l-suffixed version and pass the C locale, you can reduce the amount of stuff that’s passed in. Solaris libc also has some header-level things that let you opt into ASCII in the C locale statically, which helped a lot more.
But that’s exactly pruning a the level of object files instead of exported symbols. I’d say “proper” pruning should be able to prune symbols individually regardless where they are, and still work the same even if you had all of libc built as one .c/.o file with one text section.
Read the very next sentence to the one that you quoted. If you compile with -ffunction-sections and -fdata-sections, the linker will throw things away on a per-function and per-global basis. It will start with _start for executables or public symbols for shared libraries thand then do a graph walk to find all reachable symbols and include only those. This works even if you cat all of the .c files in libc, produce a single .o file, and link it into your final binary.
Because it changes behaviour. A few things depend on unreferenced public symbols being present so that they can be found later by things like dlsym and so on. Changing the default would break existing things. Adding it as a flag makes it easy to opt in and doesn’t break anything.
There are enourmous amounts of professional programmers that never ever touch a linker in their career. In fact, I would not be surprised if that was the majority given how most programmers are working on very high level things these days. This has nothing to do with the quality of the work they do. Their languages/environments may simply not require it.
As someone who has spent a career maintaining systems and development tools, you would amazed at what some programmers do not know.
Many have decided that they should be masters of the language and their own business use case, but everything below that is just someone else’s problem. E.g. I had to explain to a Java programmer once that RAM is actually a finite resource. When Docker was a much newer and younger project, many of the early developers were clearly brand-new to Unix/Linux. Some were simple things, like text files should always end in a newline because that’s the convention that other Unix tools rely on. But others were far bigger. Like the fact that PID 1 had special abilities/responsibilities seemed to really catch them by surprise, resulting in several different workarounds from the community until the project finally adopted --init.
I think the “all of libc” that a non-C language might want to avoid is things like errno being a global thread-local variable instead of a return value (whoops, brb changing the abi so extern int errno is actually #define errno (*errno_location()) or some other magic).
There are some interesting questions about how backwards compatibility is implemented, eg, how to deal with the 32/64-bit transition for off_t and time_t: does the kernel maintain 32 bit compatibility or does libc? If compatibility is in libc, what are the tradeoffs for out-of-tree languages between reimplementing libc’s machinery or linking to libc? There are social considerations as well as technical, because technical compatibility guarantees are mostly a surface artefact of the social processes that aim to ensure changes are managed sustainably.
More generally, I don’t think people complain about the “all” aspect at all? As in, I don’t recall anyone objecting to the size of thing?
My personal complains are:
libc butchers syscall ABI. It’s not a problem to fetch error code from errno in practice, but this is aesthetically revolting :P
libc muddies up the interfaces. OS interface, POSIX interface, and C standard library interface are all different things, but it’s almost impossible to unscramble the libc egg.
To minimize output size even without ld –gc-sections (https://maskray.me/blog/2021-02-28-linker-garbage-collection),
libc implementations often aggressively separate function variants into individual files.
E.g. asprintf.c sprintf.c vasprintf.c vsprintf.c vsnprintf.c
The archive processing semantics (https://maskray.me/blog/2021-06-20-symbol-processing#archive-processing) ensures that while libc.a:x.o is extracted, libc.a:y.o may remain unneeded.
The linker just doesn’t see the input sections from libc.a:y.o, so section-based garbage collection is not needed.
Perhaps a naiive question, but why would you ever want to statically link your systems libc? Surely that’s the one dynamic library you can guarantee is there.
On Linux, not really, different distros have different non compatible glibc versions. Particularly a problem when running software built using a normal distro on some LTS red hat like Enterprise distro. I doubt this is a problem in open BSD land though.
There’s also a small performance difference with static linking. Less indirection, link time optimization. At the cost of not sharing memory pages with other processes. Maybe that’s why they’re doing it?
Heck, forget different versions of glibc, you might find yourself on a Linux that uses a different libc altogether, like musl (Alpine Linux) or bionic (Android). And then there are weirdos like NixOS where /usr/lib doesn’t even exist and any copies of glibc you might find are at weird random paths with hashes in them. Static linking means you just run fine on all of these.
On OpenBSD, I think the situation is the opposite: libc is stable, the syscall interface is not. By statically linking libc, unless you’re a program like ls shipped with OpenBSD itself, you’re just inviting breakage for no good reason.
The main reason I see (in the case of OPEN SD) is security.
Statically linking against libc means that all symbols are stored in your binary, which means you cannot use eg. LD_PRELOAD to make programs run malicious code.
Another reason is to guarantee that your binaries are self contained, and to not depend on any external file in case of a system recovery scenario where all your partitions are not mounted for example.
Thanks to your comment I started looking up what would be the reasons and context for all this static libc.a linking.
From what I could find, the complain seems to come from Go that used to make direct syscalls to the kernel, rather than doing it through libc (up until 1.16). OpenBSD released a pinsycall(2) syscall, which somehow forced Go to comply, and as Go (usually) creates static binaries, it is then forced to statically link against libc.a.
On Ubuntu at least if you do this with glibc it does not work as expected, your executable grows much more than you would expect, I believe because glibc isn’t compiled with -ffunction-sections
sounds like the kernel offers syscall ABI stability then?
OpenBSD requires syscalls to go through libc. I would assume if you statically link libc then you need to recompile after kernel updates, though I’m finding it difficult to get a definitive answer with a very quick search.
I don’t think it’s literally ABI changes (at least, almost never), but the backward compatibility guarantee is at the
libclayer not the syscall layer. In other words, it’s considered OK to change syscall behavior if you put backward compatibility logic inlibcfor it. In any case, it’s not just a “guideline”, which is why Go switched to usinglibcrather than direct syscalls on BSD/MacOS.The article describes linking against libc.a, a static library. That means if the syscall behavior changes, the backward compatibility logic in libc will not affect already-built executables. So those executables you have lying around will now have undefined behavior, unless the syscall ABI is stable.
Yes, AFAIK in BSD world you shouldn’t static link
libcunless your binary is part of the main tree, because the syscalls aren’t stable across major versions. There are “version symbols” to make the shared library work across versions, but no way to do that with static linking. So, definitely not a “guideline”.It depends on which BSD. They have different stability and compatibility guarantees and different technical mechanisms to implement those guarantees. FreeBSD, OpenBSD, and macOS are as different from each other as they are from Linux.
Oh, interesting — I haven’t paid attention to OpenBSD for a while. So yeah, it’s just FreeBSD and MacOS where I’ve run into this. (And, I believe, Solaris, but that was long ago.)
So given all those BSDs and Windows, I had filed the “syscalls never break” rule as strictly a Linux thing.
FreeBSD has strong syscall compatibility guarantees. If a system call changes in an incompatible way, a new one is added and the old one renamed. The old one is gated behind a
COMPAT_{version number}compile option in the kernel. The official kernel builds include everything fromCOMPAT_4onwards (I think, possibly older) but if you’re building an appliance and know it will run only newer code then you can remove it.Similarly, FreeBSD’s libc uses symbol versioning so that things that used old versions of system calls whose ABIs have changed (e.g. new versions of
statwith newer fields) will call the syscall wrapper that calls the compat version, not the new one.On recent FreeBSD, the system call wrappers are in libsyscalls, which libc links, so you can provide you own implementation of the system call layer (for example, in a sandboxed environment) but use all of the libc machinery.
macOS is completely different. libSystem is the stable system-call interface. If you want to talk to the kernel, you go via libSystem or you expect breakage. Apple reserves the right to change the system-call layer between minor revisions. A few years ago, they changed the signature of gettimeofday, which broke all Go programs on macOS (because Go implemented its own system-call layer, rather than using the supported one).
Thank you for clearing that up! Of course FreeBSD has a sensible way to handle it, no surprise there. Though I’m now baffled as to why I thought this happened on FreeBSD. Maybe we were using a kernel that someone had “helpfully” changed the compat setting on to save space or something.
Non-syscall ABIs can change across major revisions. For example, device ioctls may change (though FreeBSD ioctl numbers include the size, so usually there’s a new one added rather than an old one changing). This used to happen more but now the project uses old jails on new systems for package builds. This means that a lot of things end up with compatibility interfaces.
Doesn’t openbsd specifically investigate syscalls to make sure they come from libc and not anywhere else? I may be misremembering.
yes, see pinsyscalls(2)
Here’s an LWN item about OpenBSD reducing exploitable syscall gadgets which is along the lines you are thinking.
I don’t know about OpenBSD, but macOS and Windows both also have an equivalent policy (
libSystem.dylibandntdll.dll, respectively) and both have syscall numbers that are fairly unstable.My understanding is that, for OSes other than Linux, it’s a common decision to treat the kernel and libc as two pieces of the same codebase which just happen to live on opposite sides of a kernelspace/userspace split, but still share an
enumand care more about keeping the enum members alphabetized than about keeping their raw integer representations stable.This article aged like milk
Since earlier today? Your milk should be lasting longer than that.
If you leave milk out, it can go sour. Put it in the refrigerator or failing that, a cool wet sack.
Why?
I hate to be rude but… is this a surprise to people now? Do programmers not know what linkers do? Maybe we should explain it in terms of web frontend. “A linker is kind of webpack’s tree shaking…”
(The reality is static linking is not nearly as common as it used to be. Too bad, I have 25+ year old Linux binaries I can still run thanks to it.)
Yes it’s a surprise, because doing the pruning of unused stuff on the level of object files instead of exported symbols is pretty unexpected and hacky. Why would you do it that way instead of doing proper tree-shaking? Because it was easy 50 years ago and has never been improved, that’s why. And doing
gcc -static hello.con my Linux system gets me a 700 kb executable containing, according toobjdump, over 2000 symbols.I blame glibc, but it might be neat to find out for real.
Traditionally, static linking does do proper tree shaking … tho it requires the programmers to write minimal source files, as little as one function per file. More modern toolchains have
-ffunction-sectionswhich achieves the same granularity with less manualarity.The problem is complex interdependencies. Calling
printf()implies pulling in a shitton of code via transitive function calls. (Because the stupid format string is interpreted at runtime instead of being compiled and tractable to dead code elimination.)printfcalls a lot of things. It doesn’t just format the string, it formats the string in the current locale. This means it callslocaleconv, and all of the number-formatting things. These, in turn, call all of the locale-related machinery, which brings in a load more things. You probably need a good 20% of libc forprintf.In a modern (post POSIX2008) libc,
printfis a wrapper aroundvfprintf_l, which takes an explicit locale. If you call the_l-suffixed version and pass the C locale, you can reduce the amount of stuff that’s passed in. Solaris libc also has some header-level things that let you opt into ASCII in the C locale statically, which helped a lot more.But that’s exactly pruning a the level of object files instead of exported symbols. I’d say “proper” pruning should be able to prune symbols individually regardless where they are, and still work the same even if you had all of libc built as one .c/.o file with one text section.
Read the very next sentence to the one that you quoted. If you compile with
-ffunction-sectionsand-fdata-sections, the linker will throw things away on a per-function and per-global basis. It will start with_startfor executables or public symbols for shared libraries thand then do a graph walk to find all reachable symbols and include only those. This works even if you cat all of the .c files in libc, produce a single .o file, and link it into your final binary.In that case my question is “why is this not the default?” :-P I know the answer, but the question is still valid.
Because it changes behaviour. A few things depend on unreferenced public symbols being present so that they can be found later by things like
dlsymand so on. Changing the default would break existing things. Adding it as a flag makes it easy to opt in and doesn’t break anything.There are enourmous amounts of professional programmers that never ever touch a linker in their career. In fact, I would not be surprised if that was the majority given how most programmers are working on very high level things these days. This has nothing to do with the quality of the work they do. Their languages/environments may simply not require it.
As someone who has spent a career maintaining systems and development tools, you would amazed at what some programmers do not know.
Many have decided that they should be masters of the language and their own business use case, but everything below that is just someone else’s problem. E.g. I had to explain to a Java programmer once that RAM is actually a finite resource. When Docker was a much newer and younger project, many of the early developers were clearly brand-new to Unix/Linux. Some were simple things, like text files should always end in a newline because that’s the convention that other Unix tools rely on. But others were far bigger. Like the fact that PID 1 had special abilities/responsibilities seemed to really catch them by surprise, resulting in several different workarounds from the community until the project finally adopted
--init.I think the “all of libc” that a non-C language might want to avoid is things like errno being a
globalthread-local variable instead of a return value (whoops, brb changing the abi soextern int errnois actually#define errno (*errno_location())or some other magic).There are some interesting questions about how backwards compatibility is implemented, eg, how to deal with the 32/64-bit transition for off_t and time_t: does the kernel maintain 32 bit compatibility or does libc? If compatibility is in libc, what are the tradeoffs for out-of-tree languages between reimplementing libc’s machinery or linking to libc? There are social considerations as well as technical, because technical compatibility guarantees are mostly a surface artefact of the social processes that aim to ensure changes are managed sustainably.
More generally, I don’t think people complain about the “all” aspect at all? As in, I don’t recall anyone objecting to the size of thing?
My personal complains are:
I was partly reacting to the lengthy discussion here a few days ago https://lobste.rs/s/uoxout/no_libc_zig_now_outperforms_glibc_zig#c_efvgwh
Lol, wires crossed in my head, I thought it was your discussion, not morts!
To minimize output size even without ld –gc-sections (https://maskray.me/blog/2021-02-28-linker-garbage-collection), libc implementations often aggressively separate function variants into individual files. E.g. asprintf.c sprintf.c vasprintf.c vsprintf.c vsnprintf.c
The archive processing semantics (https://maskray.me/blog/2021-06-20-symbol-processing#archive-processing) ensures that while
libc.a:x.ois extracted,libc.a:y.omay remain unneeded. The linker just doesn’t see the input sections from libc.a:y.o, so section-based garbage collection is not needed.Perhaps a naiive question, but why would you ever want to statically link your systems libc? Surely that’s the one dynamic library you can guarantee is there.
On Linux, not really, different distros have different non compatible glibc versions. Particularly a problem when running software built using a normal distro on some LTS red hat like Enterprise distro. I doubt this is a problem in open BSD land though.
There’s also a small performance difference with static linking. Less indirection, link time optimization. At the cost of not sharing memory pages with other processes. Maybe that’s why they’re doing it?
Heck, forget different versions of glibc, you might find yourself on a Linux that uses a different libc altogether, like musl (Alpine Linux) or bionic (Android). And then there are weirdos like NixOS where
/usr/libdoesn’t even exist and any copies of glibc you might find are at weird random paths with hashes in them. Static linking means you just run fine on all of these.On OpenBSD, I think the situation is the opposite: libc is stable, the syscall interface is not. By statically linking libc, unless you’re a program like
lsshipped with OpenBSD itself, you’re just inviting breakage for no good reason.The main reason I see (in the case of OPEN SD) is security. Statically linking against libc means that all symbols are stored in your binary, which means you cannot use eg.
LD_PRELOADto make programs run malicious code.Another reason is to guarantee that your binaries are self contained, and to not depend on any external file in case of a system recovery scenario where all your partitions are not mounted for example.
In every scenario in which an attacker can set a malicious LD_PRELOAD, your system is already fully compromised
Right, that’s a fair point.
Thanks to your comment I started looking up what would be the reasons and context for all this static libc.a linking.
From what I could find, the complain seems to come from Go that used to make direct syscalls to the kernel, rather than doing it through libc (up until 1.16). OpenBSD released a
pinsycall(2)syscall, which somehow forced Go to comply, and as Go (usually) creates static binaries, it is then forced to statically link against libc.a.“It rather involved being on the other side of this airtight hatchway”
On Ubuntu at least if you do this with glibc it does not work as expected, your executable grows much more than you would expect, I believe because glibc isn’t compiled with -ffunction-sections