1. 39
  1. 10

    The problem with libc-based preprocessor macros is that there is no portable way to find them. BSD systems have <sys/cdefs.h> that is included everywhere but often you need to make decisions about the libc that you’re targeting before you include any files. For example, BSD-style libc implementations (including macOS / iOS) expose all definitions unless you explicitly provide a macro asking to be restricted to some standard. In contrast, glibc hides almost everything unless you explicitly ask for certain things to be revealed. By the time you’ve included a header that defines or doesn’t define __GLIBC__ it’s too late, the internal macros that tell it to [not] expose certain functions are gone.

    Personally, I’d love for this to be part of the target triple and exposed by the compiler. If you target x86_64-linux-glibc or x86_64-linux-musl then you should get a predefined macro. That said, if you know that this is the thing that you need to set your target triple to, then your build system already has enough info to say -DUSE_MUSL or -DUSE_GLIBC.

    1. 4

      The natural place to do this detection is in the build system. We do it in build2 for C and C++ stdlibs (by preprocessing a bunch of checks and then examining the result). Works well for everything except musl.

    2. 9

      This is similar to the argument here: wiki: Feature Detection Is Better Than Version Detection

      Version detection inhibits evolution and doesn’t scale. Clang has to pretend to be GCC and every browser pretends to be Mozilla.

      However Oil does have $OIL_VERSION that lets you detect it … I guess it’s a concession to practicality in some cases. But truly portable scripts should use feature detection.

      I think the issue is that it’s generally more code/work to do feature detection right. I wrote a configure script by hand and it’s more work than looking at versions.

      1. 2

        How would you do feature detection in C? Compile a bunch of test programs and write a header file with the results? Isn’t that just perpetuating GNU autoconf?

        I ask because I have a Lua wrapper around libtls, and while I love libtls, I also hate it because it’s very inconsistent with its versioning, which makes it very difficult to compile against an arbitrary version.

        1. 5

          autoconf rightly uses feature detection, and the problems with autoconf are orthogonal (mentioned on the wiki page).

          The main problem with autoconf is that nobody understands it, largely because it is written in m4. And they copy and paste stuff they don’t need, to support stuff that nobody is using, e.g. old versions of HP or IBM Unix.

          The way to do feature detection in C is to write shell scripts that invoke the compiler on test programs (no m4 necessary). Here is Oil’s, which was inspired by the way QEMU does it (Fabrice Bellard would not use autoconf!).

          https://github.com/oilshell/oil/blob/master/configure

          OCaml also has a hand-written configure that doesn’t use autoconf. Writing it yourself takes longer, but you end up with cleaner code both at build time and in the program itself. IMO the program should be architected with portability in mind, not random #ifdefs everywhere.

          1. 5

            The problem with compilation-based probing is that you base your decision on binary success/failure but things can fail for a number of reasons that are difficult to distinguish with this approach. Specifically, your compilation test can also fail because of mis-configured build/environment (a constant pain with autoconf) or a bug in the test you have written. The really insidious variant of the latter is a bug in an optional feature test that manifests itself only on certain rarely-used platforms. Now you have this feature silently disabled because of a bug, not because it’s unavailable.

            1. 1

              Sure but that can just be shown in the output of ./configure ? If you expected it to succeed, but it failed, then you can see what happened.

              One reason I write it by hand is that so there aren’t like 100 different useless feature flags creating a lot of spew. Oil will eventually have only 1 feature flag – GNU readline – and the rest is portable C++. (Right now there is some unportability we inherited from CPython around the size of integers. I didn’t realize CPython had this!)

          2. 1

            Usually autoconf would indeed do some test programs and then define a macro like HAVE_FOO that describes you have the FOO functionality. I don’t see why the libraries you’re using couldn’t define the same kinds of macros directly, cutting out the need for test-compilation. Unfortunately, it gets a bit more subtle when a library announces it has FOO but actually has a subtly broken version of it. Not sure how to deal with that - just treat it as a bug and say the library needs to fix it?

            1. 1

              libtls does have a version macro, TLS_API. The problem is that it isn’t always updated with a new function (or a bug fix) is made.

          3. 2

            Feature detection has one other problem too, though I agree that it’s usually the better way to go: each feature flag doubles the number of possible valid compile-time configurations, and the number of permutations quickly explodes well beyond any possibility of comprehensive test coverage. If you’re testing for individual features, in principle you have to handle each of them being present or absent independently of all the others. With version detection, you get implicit grouping of features: glibc version X adds support for A, B, and C, and you will either get all three at once or none of them.

            In practice, what I’ve usually seen is that people end up only testing a limited number of specific permutations of flags that, coincidentally, happen to correspond to the features in supported OS/library versions. So it’s written as feature detection, but tested as version detection.

            Like I said, I think feature detection is still the better approach. But it can lead to a false sense of confidence in the robustness of the code.

            1. 1

              It does depend on what platforms you’re targeting and how much they vary, but in all the situations I’ve seen feature detection scales better (though as mentioned it is a little harder to write).

              It’s true if you have N flags then you have 2^N states, but that’s fundamental. It doesn’t tend to explode if you architect your code with this in mind and don’t just sprinkle random conditionals everywhere.

              With version detection you would end up with stuff like firefox > 1.1 || chrome > 2.2 || safari > 3.3, which is more fragile and has the same inherent test matrix.

              Feature detection would be var hasFoo = eval('somecode') and then if (hasFoo)

          4. 8

            This is because fork+exec is still a slow path on POWER. It is similar on ARM, MIPS, and many other RISC architectures.

            I’m curious, why?

            1. 2

              It’s slow everywhere. fork+exec is a dumb UNIXism.

              1. 7

                Can you elaborate on why it’s dumb? It’s worked (for various values of worked) for many decades.

                1. 20

                  This paper from 2019 explains it well. (But there is some good critique in the comments.)

                  A fork() in the road

                  Abstract:

                  The received wisdom suggests that Unix’s unusual combination of fork() and exec() for process creation was an inspired design. In this paper, we argue that fork was a clever hack for machines and programs of the 1970s that has long outlived its usefulness and is now a liability. We catalog the ways in which fork is a terrible abstraction for the modern programmer to use, describe how it compromises OS implementations, and propose alternatives.

                  1. 17

                    Long story short, it works quite well in simple cases: no threads, no DLL’s, no containers, no signal handlers, etc.

                    Then when you try to involve any of the above in the process it suddenly becomes very inadequate and you don’t have many other tools for sensibly dealing with the complexity involved. Processes in the 2020’s involve a lot more state than they did in the 1970’s, and managing that state with fork+exec takes a lot of fragile and flaky work.

                    1. 11

                      Yes, worked.

                      It is dumb because it turns process creation into duplicating a process and replacing it, which is way more complex (and thus slow).

                      The only reason we still do it is the classic “we’ve always done it this way”. I expect sanity will impose itself at the end, but it will obviously not be a smooth change.

                      One way or another, UNIX (including its clones) is going to be replaced by a multiserver, microkernel architecture anyway. Whatever the replacement is, it won’t rely on fork+exec for spawning executables.

                      1. 4

                        We don’t do it this way just because we’ve always done it this way. We do it this way because we’ve tried other ways and they are error prone. See vfork and posix_spawn for past unix approaches to move away from the status quo. Windows provides CreateProxessA, which is simpler but basically suffers from the same usability issues as posix_spawn.

                        I’d argue that we already have the replacement for the conventional approach. It is … fork + exec, just not in sequence. This can be seen in some programs that have large working sets and/or strong priv sep needs. The program essentially forks a process off very early that can serve as the main program and then make requests to the original process (which has a minuscule footprint) to do launch other processes. The fork is cheaper and there’s no system integration challenges, such as running on a new OS version that is subject to new IPC restrictions/trade-offs (as often happens with microkernel systems).

                        1. 6

                          fork and vfork have similar footguns. fork is pretty simple in a single-threaded program (you can do basically anything in the fork context) but that is really safe only if you control all of the code in your program, otherwise you have to assume that any library that you link may create threads. In a multithreaded program you may call only async-signal-safe functions between fork and execve and, in particular, may not call anything that might acquire a lock. If one thread enters malloc and acquires a slow-path lock then another thread calling fork can call malloc and have it work on the fast paths that don’t require a lock, right up until you hit a particular combination of heap state and timing that means that it hits the slow path and tries to acquire the lock that is owned by the other thread (which is not duplicated into the child) and deadlocks. This is insanely hard to debug.

                          In contrast, with vfork, the only footgun is that any memory that’s allocated in the child but not freed is leaked. You can malloc memory in a vfork context, you just must free it before execve. The simplest thing to do is use RAII containers to allocate memory before the vfork call and then not do any allocation in the vfork context. This has the benefit that it also works with fork, it’s just slower. It’s much easier to debug than the fork failure mode because the memory is leaked on all execution paths and will show up with valgrind or whatever you use for checking leaks.

                          The problem with CreateProcess as an explicit call is that you need to do some stuff to the created process. On Windows, all of the system calls that modify the process environment take a HANDLE to the process and modifying your own process is a special case. On *NIX, all of them assume the current process. For example, mmap modifies your own process, you’d need a mmap_other (or whatever) that took a process descriptor as an argument to allow you to modify the virtual address mapping of other processes. The same applies to setuid, setgid, and so on. This would be quite an invasive set of changes to POSIX. vfork is a simple hack that lets you do any system calls change the process state (except the memory mapping), without needing to duplicate them.

                          1. 2

                            fork is pretty simple in a single-threaded program (you can do basically anything in the fork context) but that is really safe only if you control all of the code in your program, otherwise you have to assume that any library that you link may create threads

                            Oh look, someone else who experienced that pain.

                            1. 4

                              Honestly, I think you could do the world a favour by making fork return an error if it is called in a multithreaded program. The set of constraints on fork in a multithreaded program are more restrictive than vfork and it’s slower. With vfork, you may allocate but you must deallocate everything before execve if you don’t want memory leaks. With fork you may not allocate or call any other function unless it is explicitly marked as async signal safe. Even having fork silently call vfork if used in a multithreaded program would probably be an improvement. You’ll leak memory in the parent rather than deadlocking in the child.

                            2. 1

                              I certainly didn’t mean to say fork is good. I just meant that based on real world bugs it seems to be the easiest for people to understand and avoid said bugs. As much as alternatives can help avoid known pitfalls they also introduce their own pitfalls. The result is that even people who know of newer approaches tend to choose fork because they’ve also been burned by it’s replacements. The early pre-fork approach avoids the threading and memory mapping issues as much as possible. You do need to be careful that dependencies aren’t initialized in a way that can launch threads, open files, etc but in practice that has proven easier (especially for larger teams with heterogeneous knowledge sets) than tracking the rough edges of each process spawning approach and always choosing the most appropriate one.

                              1. 1

                                I certainly didn’t mean to say fork is good. I just meant that based on real world bugs it seems to be the easiest for people to understand and avoid said bugs

                                Even with that caveat, I still disagree. With vfork, if I call malloc then I will deterministically leak memory if I forget to call free, so the easiest solution is to avoid calling malloc at all in the vfork block. If I get it wrong, then valgrind will tell me. With fork, if I call malloc then I will nondeterministically get deadlock, in some situations. I know not to call malloc in a fork context but if I get it wrong then it’s almost impossible to debug via any mechanism other than reading the code and realising that this is what happened.

                                Worse, if I wrote code that called fork in a single-threaded program, it could call malloc and then if someone else comes along later and adds a background thread then they have introduced a nondeterministic bug into some code that they’ve never looked at, let alone modified.

                                The only way that I can use fork safely is to apply a stricter set of restrictions to myself than I would for vfork. In practice, I usually want my code to work with vfork or fork, and so I end up with the union of the restrictions (which basically boils down to not doing anything other than system calls in the child context).

                            3. 2

                              I’m by no means an expert in this, but aren’t usability issues meant to be addressed at the language level more so than at the architecture level? As in, just because a bare architecture has usability issues, doesn’t mean it isn’t worth considering building on top of, right? The opposite seems like letting the perfect be the enemy of the good.

                              1. 2

                                I’d argue that we already have the replacement for the conventional approach. It is … fork + exec, just not in sequence. This can be seen in some programs that have large working sets and/or strong priv sep needs. The program essentially forks a process off very early that can serve as the main program and then make requests to the original process (which has a minuscule footprint) to do launch other processes. The fork is cheaper and there’s no system integration challenges, such as running on a new OS version that is subject to new IPC restrictions/trade-offs (as often happens with microkernel systems).

                                Android has the zygote setup for preforking processes with an initialized runtime, last I checked.

                                1. 2

                                  Yeah that is the sort of approach I’m talking about. Chrome and openssh do something similar at the application level.

                                  1. 1

                                    Is that still true? I was under the impression that the zygote was removed a few years ago because it completely defeats ASLR. Every process was a forked copy of the same initial instance and so had a stable ASLR seed even across process restarts. This led to only about 8 bits of entropy in most pointers (from the allocator) and none for non-JIT’d code pointers, so an attacker who can try 256 times (made easier by the fact that Android helpfully restarts processes that crash) can deterministically compromise it.

                            4. 5

                              Sure, but the post implies it’s slower compared to x86, and I don’t know why that would be true.

                              Also shell scripts are notoriously slow on Windows compared to Linux because Windows doesn’t use a fork model. Perhaps that’s because UNIX shells are designed around forking. Fork certainly is more complicated and fragile as you outline in another comment, but slower? I’m not convinced. The specific case of fork+exec has been aggressively optimized. Is there a non-forking OS that trashes Linux in subprocess creation speed?

                              1. 3

                                Also shell scripts are notoriously slow on Windows compared to Linux because Windows doesn’t use a fork model

                                The lack of fork isn’t the reason that process creation is slower on Windows. Windows processes are very large. A new process has a load of DLLs mapped by the kernel, has multiple threads, and so on. Creating a new picoprocess on Windows is very fast.

                                The specific case of fork+exec has been aggressively optimized. Is there a non-forking OS that trashes Linux in subprocess creation speed?

                                Linux with vfork + execve is significantly faster than Linux with fork + execve.

                                1. 2

                                  For POWER specifically, on Linux 5.4 at least, there is a slow path involving copy_from_user that is part of the syscall which is optimised away on x86. I believe part of this has to do with the fact that the MMU algorithm is swappable (HPT vs Radix). IIRC, the same copy_from_user happens on ARM, but I’ve long forgotten the arcana there.

                                  For a good time, benchmark FreeBSD/sparc64’s fork to FreeBSD/amd64. That’s far worse because of how they handled process switching on the SPARC. (Linux comparison can be found in include/switch_to_64.h.)

                                2. 3

                                  This is a thing that Windows does better. Create process is its own call and you specify what the new process should be and get a kernel object handle that refers to it.

                                  It does make porting unix programs that expect to fork workers instead of spawning threads something of a nightmare, mind you.

                                  1. 1

                                    What advantages do you see CreateProcess providing over posix_spawn?

                                    1. 3

                                      I never used posix_spawn so I don’t know its semantics. It wasn’t an option for most of the time when I was writing stuff that needed fork, and I haven’t updated my knowledge since.

                                      1. 3

                                        By itself, none, but posix_spawn was intentionally designed to be limited. It is a simple (hah!) API that fits the vast majority of cases, not a general-purpose process creation tool.

                                        The NT APIs, in contrast, let you do absolutely anything to a process that you have a HANDLE for that you can do to your own process. You can create mappings, start threads, inject new handles, and so on, all without the target process needing to actively participate (in *NIX you could do this if you had some code running in the target that received FDs over a UNIX domain socket). Cygwin is able to implement fork in userspace on Windows by creating a new process and then creating mappings and copying memory across. It’s very slow and not something anyone should ever do, but it demonstrates that it’s interesting.

                                    2. 3

                                      So, why call it out for RISC?