From what I remember from one RFC I saw, part of the issue was that the glibc devs were adopting a “We documented it in the manpage. You’re just holding it wrong.” stance for a while and that delayed both the Rust-side and glibc-side changes mentioned at the end of the post since the Rust devs try to avoid unilaterally hacking around other people’s designs until they’ve run out of other, more cooperative options.
Can’t the Rust side, just… avoid getenv(3) and setenv(3)? One should be able to implement thread safe versions from system calls, can’t we? No need to be unfriendly with the glibc devs if we can avoid depending on them in the first place.
getenv(3) and setenv(3) aren’t getting the environment from a kernel service, the environment is just a global variable extern char **environ that gets initialized by execve(2) when starting a process. There’s no “other place” to get the environment from where you will be safe from libc’s predations. Part of the sturm und drang in the historical issues there seems to have been that Rust’s set_env was “safe” and had a locking regime that made it thread-safe if you only modified environ from Rust code; linked-in dependencies not written in Rust would have no idea about what Rust was doing.
Crap. I see. One thing bothers me though: why does setenv(3) even exists? The program’s environment is an input, and if it’s global it should be a global constant. We know global variables are bad since, like, 40 years now? I sense legacy from old school procedural programming, the same kind that decided that a program only needed one parser (old lex & yacc worked over global variables too).
Oh well, I guess I have yet another reason to reduce my dependencies to the absolute minimum.
POSIX/Unix and its descendants are a product of legacy. setenv(3) comes from V7 Unix 4.3BSD, which was neither SMP aware, nor did it have the concept of multiple threads of execution sharing an address space. As mentioned above the environment is just memory that’s written into the process’s address space during process creation.
Since processes didn’t even share memory at that point there was no harm in allowing writing to what was already mutable memory - and assumedly it made it easier for programs configured through env vars to manipulate those env vars.
Edit: getenv(3) comes from V7 Unix. setenv(3) comes from 4.3BSD.
Also worth noting that early Unix machines had memory best measured in kilobytes. Even if they had multiple threads of execution, there would have been design pressure to avoid taking multiple copies of the environment.
Early UNIX also had a lot of features in libc (and the kernel) that were really just for shells. The setenv function is in this category. In a shell, you have an exported environment and you want commands to modify it and for child processes to inherit it. Having a single data structure for this is useful, when you invoke execve in a child process, you just pass environ to it.
For any process that is not a shell, setenv should never be called. The two cases are entirely separable:
If you want to pass state around within a process (including to and from shared libraries), you use functions or globals.
If you want to pass state to a child process, create an environment array before exec.
setenv is useful between fork and exec to configure the environment of the child process. Yes, you could use the environment setting variants of exec but often times it is easier to just set it remove a few variables.
Setenv and getenv are not signal async safe, so you cannot use them safely in a fork of a process. If a signal had at the same time modified the environ variable, you might be reading half-valid state.
The proper way, that is valid considering all restrictions in glibc docs, to do it is to copy the environ yourself and modify the new copy in thread local state (the copy is still not async-signal safe). Then you modify it as needed, call fork and then immediately execvpe/execle.
There isn’t a good reason to do processing after a fork, it only leads to hard to diagnose bugs when inevitably you end up messing with non-async-signal-safe state.
Looks like we’re missing a function that wraps execle(3) or execvpe(3), where instead of specifying the entire environment, we only specify the value of the environment variables that changed. That way we wouldn’t have to use setenv(3) the way you suggest.
The whole UNIX API process spawning API is based around fork, call a bunch of things to change the spawned process then exec. This is pretty clever because it means that you don’t need a super complex spawning API that allows configuring everything and you can do it using real code rather than some plan-old-data which will inevitably grow some sort of rudimentary logic for common issues. So in this environment it makes sense that we don’t even really need a way to set the environment at all with exec, the “UNIX way” is fork, setenv then exec, no collection of exec variants for managing the environment in different ways.
However while the fork, …, exec solution is clever; a better solution would probably be to make all of these functions take a target process argument rather than simplicity targeting the current process so that the spawn process would look something like spawn, do whatever to configure the child with the returned process ID, start. In a scenario like this it makes more sense to pass the environment in to the spawn call and have it immutable. But this also brings up other questions, like are some functions only allowed between spawn and the first start? For opening and closing file descriptors do we need a cross-process close?
From what I’ve seen from looking at their sources, higher-level languages tend to use posix_spawn(3)/posix_spawnp(3) for their subprocess-launching APIs and switch to fork/exec if you ask for something not expressible in it.
From glancing at the source: No. Because the thing you actually want to interact with is the pointer containing the ENV which is managed by the unsafe glibc functions and used by the c-programs you’re (eventually always) interacting with. Even if you would get a hold to that, all other c-libs around your program can (and will) still change it, or read it without caring about your additional lock. Famous reason in rust to call into ENV is for example time management.
You could also have c-libs initiating before your rust code has a chance of doing anything, and now they hold getenv-pointers while you try to do anything.
Your code might also not even know that it is calling set/getenv indirectly.
Ah, setenv and getenv. Steam also struggled with this recently. It is safe to say that nobody can use this API properly. In other words, it is broken (even if it behaves as documented, that is not good enough when ‘as documented’ means everyone has crashes because it’s impossible to use properly).
Unfortunately, this is yet another case where the issue originates from POSIX (MacOS and the BSDs could be affected as well). The standard shows its age, many constructs have issues or even break down when multi-threading is involved. Remember, before C11/C++11, these languages did not even describe a proper multi-threaded memory model (see also the legendary essay Threads cannot be implemented as a Library, published in 2004). Everything from start to finish was just a kludge. Of course, many amendments and mitigations were made to alleviate this problem, but POSIX is holding back innovation, both in operating systems and in the user space. Back then, most programs consisted of one or two simple C files implementing a single-threaded program. POSIX would be radically different if it were designed today.
The worst thing about setenv crashing getenv is that getenv is used everywhere. Date/time functions will read TZ. Network-related functions will read proxy and DNS resolver options. Leaf functions in random libraries will have hidden config overrides.
The problem is that everything else is calling ?etenv() too, not just your own code. It’s hard to avoid the footgun when everything seems to have a trigger.
TLDR (in this particular case): setenv and getenv are not thread safe, I assume char** environ suffers similar misery.
The problem is that there are a lot of core POSIX APIs/variables, etc that are inherently not thread safe, and cannot reasonably be made thread safe. Some cases can be mitigated by using TLS (as was done with errno), but cases like this especially seem unfixable.
Now you could make safe rust wrappers for a bunch of these APIs, but there’s nothing that those wrappers can do if there’s also non-rust code interacting with them.
I guess in principle libc could put locks around manipulating the environment, but I’m not sure that protects you form people using environ directly. From a design PoV having environ not be accessible would seem to be the best course of action, though that ship has sailed, but I’ve also used it for terrible terrible things in the past so I don’t get to complain :D
getenv is thread-safe. setenv isn’t, it’s just that the crash/UB will manifest in a following getenv call. So you have unavoidable problems if your process has calls to setenv that are in code you don’t control, because there’s nothing your getenv calls can do about it.
Not sure how much of this is specific to glibc … Like, do Apple’s or MS’s C libraries have this problem or do they use locks?
glibc docs say getenv is MT-Safe env, which means thread-safe unless somebody concurrently modifies environment variables, eg. via setenv.
POSIX 2024 has the same idea:
Some earlier versions of this standard did not require getenv() to be thread-safe because it was allowed to return a value pointing to an internal buffer. However, this behavior allowed by the ISO C standard is no longer allowed by POSIX.1. POSIX.1 requires the environment data to be available through environ[], so there is no reason why getenv() can’t return a pointer to the actual data instead of a copy. Therefore getenv() is now required to be thread-safe (except when another thread modifies the environment).
There it is! Thanks. (Google really has gotten bad, huh?)
BTW, I was interested to note on cppreference that C++ stdlib has threadsafe std::getenv since C++11 — as long as nothing calls setenv — and has no std::setenv at all. I guess that’s one way to deal with it.
Those locks don’t help though. It means that getenv is guaranteed the environment block isn’t changing while it’s being navigated to find an entry, but it returns a pointer to that entry and drops locks, meaning that the lifetime of the entry is only valid until the next setenv. It a program could truly promise that it would not race getenv and setenv, which it needs to for correctness, the locks could just be removed since they could never contend. They look to be there just to ensure the resulting crash isn’t in the C library, and the Windows C library has equivalent locks.
For this interface to work, getenv either needs to copy the variable to a buffer of the caller’s lifetime’s choosing, or implement some kind of freeenv so that the function performs an allocate/copy and can be freed when it’s no longer needed. The alternative just becomes “nothing can use setenv on any thread until all the getenv business logic is finished with its return values”, which has been shown to not work.
As long as setenv() isn’t used to modify the variable you retrieved, your pointer is fine. However, check this out…
From the macOS getenv(3) man page:
Successive calls to setenv() that assign a larger-sized value than any previous value to the same name will result in a memory leak. The FreeBSD semantics for this function (namely, that the contents of value are copied and that old values remain accessible indefinitely) make this bug unavoidable. Future versions may eliminate one or both of these semantic guarantees in order to fix the bug.
But if the buffer was allocated by the current process in a previous call to setenv(), it doesn’t behave this way at all (see here):
> let shell = getenv('SHELL')
> shell
"/opt/homebrew/bin/fish"
> setenv('SHELL', 'foo')
nil
> shell
"/opt/homebrew/bin/fish"
> getenv('SHELL')
"foo"
> setenv('BAR', 'baz')
nil
> let bar = getenv('BAR')
> bar
"baz"
> setenv('BAR', 'x'.repeat(128))
nil
> getenv('BAR')
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
> bar
=================================================================
==40619==ERROR: AddressSanitizer: heap-use-after-free on address 0x000105a0f114 at pc 0x0001034a3520 bp 0x00016eefd110 sp 0x00016eefc8a8
READ of size 1 at 0x000105a0f114 thread T0
This is interesting. It appears that the realloc behavior was added in Libc-825.24 (compare to Libc-763.11). This corresponds to macOS 10.8. So in 10.7 and earlier, it did indeed leak everything, but starting in 10.8 it is happy to call realloc() on any string it originally allocated through a call to setenv(). The code also references UNIX03 here as though that’s the reason for the change. Interestingly, this change predates the addition of locking to these functions.
It seems to me that either the manpage needs to be updated (it says that future versions may change the behavior to fix the leak, but this isn’t future behavior now it’s past/present behavior), or the getenv() function should be updated to mark the returned string as unowned, thus allowing it to leak. I’m inclined to file a radar on this, though such longstanding behavior is not very likely to change.
In Windows the C library is layered on top of the OS (it’s not the native interface.) The Win32 API exposes GetEnvironmentVariable and SetEnvironmentVariable, which use locks and copy the data into a caller’s buffer, so they’re threadsafe. Unfortunately it looks like implementing the C getenv function interface is inherently unsafe, because it returns a pointer without a corresponding free. The C library ends up with its own (different!) copy of the environment block, and returns pointers to it, not unlike POSIX. In the end though, they defined getenv_s which mirrors the Win32 semantics.
My advice to anyone writing for Windows though is to avoid the C library as much as possible. It’s a cross platform abstraction layer that hides system behavior and introduces its own quirks (in this case, a duplicate/divergent environment block.)
The environment on Windows is quite different to UNIX. On UNIX, environment variables are just another set of program arguments (the other two on modern systems are the explicit arguments and the ELF auxiliary argument vector). You get whatever the parent process gave you. Traditionally, the parent process was a shell, which would mutate the environment that it managed and give you a copy when it started you.
On Windows, the environment is some global state (per login session, if I remember correctly) and the get and set operations read and write this state. If a user modifies the environment, the next call to get that environment variable will return the new version. I can’t remember if there’s a way of getting a notification on change.
On OpenStep / macOS, most of these use cases were superseded by user defaults, which provide a set of layered dictionaries and a way of getting notifications when things change.
In Windows each process has its own environment block, and specifies the environment block a child process should have in CreateProcess, not unlike UNIX. There is a key difference in data structure (single block vs. array of strings) which is important for buffer lifetime because it controls which modifications can invalidate pointers. Programs like Explorer attempt to recreate a vanilla “login session” environment block from the registry when launching child processes with things like CreateEnvironmentBlock, but it’s not a centrally maintained piece of data. A regular call to SetEnvironmentVariable is only affecting the current process, not the session.
I was curious, so I checked Go, and it puts a mutex around its Getenv/Setenv standard library functions. You could still screw it by doing it in a C library, but it would be tricky, since it caches the environment between calls to set.
I mean getenv is thread safe in the sense it doesn’t mutate state, but it’s not thread safe if setenv happens at the same time - yet setenv is the thing going wrong, but afaict getenv doesn’t have the ability to make an atomic copy of the environment list at the start :D
I have no idea how the other OS’s deal with it, in principle you could (should?) make get/setenv acquire a lock - especially given I cannot believe acquiring a lock is significantly expensive compared to the piles of string parsing they’re doing - but I think the environ global still hoses you :-/
And that’s how we ended up with a crash, caused by unsafe-free Rust code badly interacting with the use of libc elsewhere in the program.
env::set_var(ENV_CERT_FILE, path);
I don’t understand why Rust has an API like this. It’s obvious that set_var mutates global state. And when you have mutable global state and threads, you’ll have races.
It should be my_handle_to_mutable_data.set_var().
I was going to say that this is a “static models vs. dynamic reality” problem, but it seems even simpler than that … the model is just done wrong here.
Static types are models; data from network/disk is reality.
(Likewise, whatever glibc does is reality, but in this case it’s obvious what it does. It mutates a global variable, which may be shared between threads.)
it’s unclear to me how this would fix things? The core problem is that, even if the Rust code decides to take a lock, any C code running concurrently may not, and thus it’s fundamentally impossible for any Rust-side changes to actually make this fully safe.
Missing coordination between Rust layer and C layer is not much important issue. The main problem is that even if you fix it with locks, it is a global state shared by all threads. So despite it will not crash like described in the article, it will not work correctly, because particular threads will compete and rewrite the global state each other – i.e. last configuration set by a random thread will apply to all other threads.
The env variables make sense as a mostly immutable data passed from parent process to child. It can be modified early at the program start (single thread, before anyone reads the env) or after fork() before exec(). But not during program runtime, especially if the program is multi-threaded.
Any C code that calls setenv() concurrently should also be deemed unsafe, according to Rust.
It is definitely a limitation of static types that they can only reason about a single language inside a process, while you my have 2 or more languages within a process.
But this is an obvious bug even in pure Rust code without unsafe – i.e. just 1 language.
The Rust bug is marking set_var and remove_var functions as safe. The Rust fix is to mark them as unsafe. For stability reasons this requires a new edition, which only happens once ever three or so years. So there has only been a warning on the set_var page until that happens.
However, Rust obviously can’t fix other languages. Python, C, etc are on their own there. Arguably it’s libc who are ultimately responsible as they’re the ones managing the shared resource, albeit they’re constrained by the C and POSIX specifications).
Not just according to Rust: POSIX says “The setenv() function need not be thread-safe.” edited to add: There are also several scary caveats in POSIX about environ
POSIX doesn’t say it’s always unsafe because Solaris / Illumos have a thread-safe setenv() — at the cost of leaking memory because old versions of the environment cannot be freed safely. But on every other OS setenv() is unusable in multithreaded code.
C has getenv() but not setenv() so it has nothing to say on this matter.
Rust could maybe have a safe setenv() but it would have to manipulate its own environment, not the libc environ, so getenv() (or direct access to environ) in non-Rust libraries linked into the same program would not be accessing the same environment.
This isn’t a limitation of static types, it’s a limitation of the anaemic C FFI that can’t model any worthwhile static properties of an API.
Reasoning across languages is a limitation of static types (you can’t)
You can, up to the common subset of the languages’ type systems. Many of the existing examples involve C so the common subset is very poor.
Maybe your point is that you can’t use type system features that are outside the common subset to reason across the language barrier. But it seems weird to me to imply that you can’t do any reasoning at all, or that this is a limitation of static typing in general rather than the particular type systems in play.
In this case there was reasoning across languages: the FFI call was type safe up to the limitations of C’s type system, which is what allowed Rust to call C directly. When the compiler is unable to make static guarantees then a foreign call looks more like an RPC interface with runtime checks and format translations. You see this kind of faff in the C APIs of dynamic languages.
There are a couple of things that make it hard to have FFI with richer type systems:
Languages with sophisticated type systems tend to have sophisticated runtime systems. You can’t have multiple runtimes in the same process because a runtime generally assumes it is responsible for basically all the process’s memory and threads.
So languages on the JVM and the CLR have more sophisticated cross-language object-oriented static type systems, but there are fewer examples in the functional programming tradition where the runtime isn’t typically exposed as an independent layer.
There aren’t (yet) very many languages with minimal runtimes and rich type systems designed for programming in the large. A lot of the ones that aren’t Rust are designed for stronger verification of a component of a larger C or C++ program so they don’t try to expose fancy types.
Maybe in the future there will be more variety in this area, and a greater desire for these languages to call each other across more richly typed interfaces.
You can, up to the common subset of the languages’ type systems.
I don’t follow … what I’m saying is that the common subset of any 2 type systems is the empty set.
That’s why you need runtime checks in the glue between them.
e.g. an i32 in Rust is not the same thing as an int32_t in C++ – it doesn’t have the same valid operations
And that follows for the rest of the type system too:
Rust functions and C++ functions are fundamentally different (different ownership for params and return values)
Rust traits have basically no relation to C++ classes, etc. You can make some kind of mapping between them, but it’s arbitrary, and must be obeyed / enforced at runtime
And you can replace Rust with OCaml, and C++ with Swift or Zig, and that’s all still true
Maybe you are saying, well i32 is really the same as int32_t. I am not sure that is true – but maybe if you reference the ABI, which is separate from the language. And even C and C++ don’t have compatible ABIs – hence the existence of extern "C"
But let’s suppose for a minute that it’s true for primitives … As soon as you get to even “struct”, the languages don’t have any obvious mapping between them (most languages do have special cases for C, including Rust, but not for say Zig).
The binding tools all have extra-linguistic conventions, and there are very often holes in those conventions (including say locking).
C++ and C have a large common subset, aided by extern “C”
Rust has something like extern “C” as a special case, so it has a common subset with C. But it doesn’t have extern Zig or extern Swift, etc.
So those are two special cases, where one language explicitly has support for another.
But, in general, the common subset of any other 2 type systems is zero / the empty set.
The way to get a common subset is to purposely implement the static semantics of C++ with Rust syntax, and conversely the static semantics of Rust with C++ syntax
You can clearly see this difference in how the fish shell migrated from C++ to Rust - https://news.ycombinator.com/item?id=42535217 - i.e. read about their experiences with the binding generators, and also note that fish was never released as a hybrid C++ / Rust program (although there is more than one reason for that)
And I realize this is far afield from the original discussion, but what I’m saying that is that the type system helps you within one language, not across languages. The two languages interact dynamically, not statically, e.g. by calling setenv() concurrently.
But this is an obvious bug even in pure Rust code without unsafe – i.e. just 1 language.
Yes but this is safe in pure Rust, because it does already take a lock. All the surprise cases have been from mixing in C, because even the libc calls that Rust does can end up doing getenv.
(In retrospect I was a bit unclear in my previous comment: “even if the Rust decides to take a lock” was meant to be like “Rust taking a lock is irrelevant”, but it actually sounds like “Rust doesn’t currently take a lock” which is just wrong.)
It is not about Rust, it is not about Assembly, nor ARM64 vs. RISC-V…
Just:
$ man setenv
...
ATTRIBUTES
For an explanation of the terms used in this section, see attributes(7).
┌─────────────────────┬───────────────┬─────────────────────┐
│Interface │ Attribute │ Value │
├─────────────────────┼───────────────┼─────────────────────┤
│setenv(), unsetenv() │ Thread safety │ MT-Unsafe const:env │
└─────────────────────┴───────────────┴─────────────────────┘
n.b. MT-Unsafe
Environment variables are a simple interface between the parent process (the environment) and the program being executed. Not a dynamic global thread-safe map.
Note that this is only true on the platform where you obtained that manual page. The people who wrote the implementation have made a choice not to make the interface safe.
If you look at https://illumos.org/man/3C/setenv ours is MT-Safe, and there’s really no reason that every other libc couldn’t make the same decision.
BTW: Does Illumos have independent env for threads or shared by whole process (and just thread-safe)? It might be useful or not… However at current state (env is usually not only shared but even MT-Unsafe), it is just a bad design if someone recklessly modifies env in a multi-threaded program or if a library can be parametrized only using env and does not provide any other per-thread configuration (like functions or thread-local variables).
The environment is definitely per process, not per thread. FWIW, I don’t think setenv(3C) is a good interface, and I don’t think people should use it (there are vastly preferable alternatives for every legitimate use case in 2025), I just don’t think it should be unnecessarily thread/memory unsafe!
What’s the preferable alternative for “i’m using a library which changes behavior based on environment variables and i need to control that behavior”? Note that libc itself is a library that does this (e.g. the TZ env var).
I think in general the environment variables are intended to allow control from outside the process; this applies to the ambient timezone and locale stuff especially. If you override them within the process you’re preventing those mechanisms from working as designed. In general, any library interface that can only be configured through environment is not a good interface.
In the case of locales, the newlocale(3C) routine was added to allow programs to look up a specific locale and get a handle to it, rather than use the ambient locale from LANG and LC_ALL in the environment. Probably we should be looking to add a newtimezone(3C) routine to allow a similar thing with timezones!
I wish LD_PRELOAD facilitated shim libraries without the library essentially having to become very linker-aware if they want to call the otherwise-linked-by-default implementation. e.g. in CI environments we could load dynamic linting libraries that complain loudly if a process calls getenv after setenv. This isn’t the only case where ancient interfaces are inherently dangerous and unlikely to be fixed soon.
*nod* This problem has been dogging Rust for about as long as Rust has been stable:
From what I remember from one RFC I saw, part of the issue was that the glibc devs were adopting a “We documented it in the manpage. You’re just holding it wrong.” stance for a while and that delayed both the Rust-side and glibc-side changes mentioned at the end of the post since the Rust devs try to avoid unilaterally hacking around other people’s designs until they’ve run out of other, more cooperative options.
Can’t the Rust side, just… avoid getenv(3) and setenv(3)? One should be able to implement thread safe versions from system calls, can’t we? No need to be unfriendly with the glibc devs if we can avoid depending on them in the first place.
getenv(3)andsetenv(3)aren’t getting the environment from a kernel service, the environment is just a global variableextern char **environthat gets initialized byexecve(2)when starting a process. There’s no “other place” to get the environment from where you will be safe from libc’s predations. Part of the sturm und drang in the historical issues there seems to have been that Rust’sset_envwas “safe” and had a locking regime that made it thread-safe if you only modifiedenvironfrom Rust code; linked-in dependencies not written in Rust would have no idea about what Rust was doing.Crap. I see. One thing bothers me though: why does
setenv(3)even exists? The program’s environment is an input, and if it’s global it should be a global constant. We know global variables are bad since, like, 40 years now? I sense legacy from old school procedural programming, the same kind that decided that a program only needed one parser (old lex & yacc worked over global variables too).Oh well, I guess I have yet another reason to reduce my dependencies to the absolute minimum.
POSIX/Unix and its descendants are a product of legacy.
setenv(3)comes fromV7 Unix4.3BSD, which was neither SMP aware, nor did it have the concept of multiple threads of execution sharing an address space. As mentioned above the environment is just memory that’s written into the process’s address space during process creation.Since processes didn’t even share memory at that point there was no harm in allowing writing to what was already mutable memory - and assumedly it made it easier for programs configured through env vars to manipulate those env vars.
Edit:
getenv(3)comes from V7 Unix.setenv(3)comes from 4.3BSD.Also worth noting that early Unix machines had memory best measured in kilobytes. Even if they had multiple threads of execution, there would have been design pressure to avoid taking multiple copies of the environment.
Early UNIX also had a lot of features in libc (and the kernel) that were really just for shells. The
setenvfunction is in this category. In a shell, you have an exported environment and you want commands to modify it and for child processes to inherit it. Having a single data structure for this is useful, when you invokeexecvein a child process, you just passenvironto it.For any process that is not a shell,
setenvshould never be called. The two cases are entirely separable:I’m going to quote that (heh) Verbatim (oh no) in any future discussions on this topic.
setenv is useful between fork and exec to configure the environment of the child process. Yes, you could use the environment setting variants of exec but often times it is easier to just set it remove a few variables.
Setenv and getenv are not signal async safe, so you cannot use them safely in a fork of a process. If a signal had at the same time modified the environ variable, you might be reading half-valid state.
The proper way, that is valid considering all restrictions in glibc docs, to do it is to copy the environ yourself and modify the new copy in thread local state (the copy is still not async-signal safe). Then you modify it as needed, call fork and then immediately execvpe/execle.
There isn’t a good reason to do processing after a fork, it only leads to hard to diagnose bugs when inevitably you end up messing with non-async-signal-safe state.
Looks like we’re missing a function that wraps
execle(3)orexecvpe(3), where instead of specifying the entire environment, we only specify the value of the environment variables that changed. That way we wouldn’t have to usesetenv(3)the way you suggest.The whole UNIX API process spawning API is based around
fork, call a bunch of things to change the spawned process thenexec. This is pretty clever because it means that you don’t need a super complex spawning API that allows configuring everything and you can do it using real code rather than some plan-old-data which will inevitably grow some sort of rudimentary logic for common issues. So in this environment it makes sense that we don’t even really need a way to set the environment at all withexec, the “UNIX way” isfork,setenvthenexec, no collection ofexecvariants for managing the environment in different ways.However while the
fork, …,execsolution is clever; a better solution would probably be to make all of these functions take a target process argument rather than simplicity targeting the current process so that the spawn process would look something likespawn, do whatever to configure the child with the returned process ID,start. In a scenario like this it makes more sense to pass the environment in to thespawncall and have it immutable. But this also brings up other questions, like are some functions only allowed betweenspawnand the firststart? For opening and closing file descriptors do we need a cross-processclose?From what I’ve seen from looking at their sources, higher-level languages tend to use
posix_spawn(3)/posix_spawnp(3)for their subprocess-launching APIs and switch tofork/execif you ask for something not expressible in it.From glancing at the source: No. Because the thing you actually want to interact with is the pointer containing the ENV which is managed by the unsafe glibc functions and used by the c-programs you’re (eventually always) interacting with. Even if you would get a hold to that, all other c-libs around your program can (and will) still change it, or read it without caring about your additional lock. Famous reason in rust to call into ENV is for example time management.
You could also have c-libs initiating before your rust code has a chance of doing anything, and now they hold getenv-pointers while you try to do anything.
Your code might also not even know that it is calling set/getenv indirectly.
[Comment removed by author]
Ah,
setenvandgetenv. Steam also struggled with this recently. It is safe to say that nobody can use this API properly. In other words, it is broken (even if it behaves as documented, that is not good enough when ‘as documented’ means everyone has crashes because it’s impossible to use properly).Interesting, so it is a pure POSIX/Linux problem and totally fine on macos and windows.
macOS solves the problem by copying the environment on change and leaking the old copy.
This is not what I would be calling totally fine, but more like a barely acceptable kludge
This also only solves the problem if you use
setenv()orputenv(), if you touchenvirondirectly you still bypass the protections.But in either case it is still in a better state than linux.
Unfortunately, this is yet another case where the issue originates from POSIX (MacOS and the BSDs could be affected as well). The standard shows its age, many constructs have issues or even break down when multi-threading is involved. Remember, before C11/C++11, these languages did not even describe a proper multi-threaded memory model (see also the legendary essay Threads cannot be implemented as a Library, published in 2004). Everything from start to finish was just a kludge. Of course, many amendments and mitigations were made to alleviate this problem, but POSIX is holding back innovation, both in operating systems and in the user space. Back then, most programs consisted of one or two simple C files implementing a single-threaded program. POSIX would be radically different if it were designed today.
The worst thing about
setenvcrashinggetenvis thatgetenvis used everywhere. Date/time functions will read TZ. Network-related functions will read proxy and DNS resolver options. Leaf functions in random libraries will have hidden config overrides.previously: Setenv is not Thread Safe and C Doesn’t Want to Fix It (26 comments)
The problem is that everything else is calling
?etenv()too, not just your own code. It’s hard to avoid the footgun when everything seems to have a trigger.TLDR (in this particular case): setenv and getenv are not thread safe, I assume
char** environsuffers similar misery.The problem is that there are a lot of core POSIX APIs/variables, etc that are inherently not thread safe, and cannot reasonably be made thread safe. Some cases can be mitigated by using TLS (as was done with errno), but cases like this especially seem unfixable.
Now you could make safe rust wrappers for a bunch of these APIs, but there’s nothing that those wrappers can do if there’s also non-rust code interacting with them.
I guess in principle libc could put locks around manipulating the environment, but I’m not sure that protects you form people using
environdirectly. From a design PoV having environ not be accessible would seem to be the best course of action, though that ship has sailed, but I’ve also used it for terrible terrible things in the past so I don’t get to complain :Dgetenv is thread-safe. setenv isn’t, it’s just that the crash/UB will manifest in a following getenv call. So you have unavoidable problems if your process has calls to setenv that are in code you don’t control, because there’s nothing your getenv calls can do about it.
Not sure how much of this is specific to glibc … Like, do Apple’s or MS’s C libraries have this problem or do they use locks?
I wouldn’t put it that way.
glibc docs say
getenvisMT-Safe env, which means thread-safe unless somebody concurrently modifies environment variables, eg. viasetenv.POSIX 2024 has the same idea:
Skimming Darwin libSystem code (possibly out of date), there is a lot of locking going on in getenv/setenv. Their code has lots of references to FreeBSD, but current FreeBSD code has no locking that I can see.
Here’s that same code from macOS 15.1 libc, still has locks.
There it is! Thanks. (Google really has gotten bad, huh?)
BTW, I was interested to note on cppreference that C++ stdlib has threadsafe
std::getenvsince C++11 — as long as nothing callssetenv— and has nostd::setenvat all. I guess that’s one way to deal with it.Those locks don’t help though. It means that
getenvis guaranteed the environment block isn’t changing while it’s being navigated to find an entry, but it returns a pointer to that entry and drops locks, meaning that the lifetime of the entry is only valid until the nextsetenv. It a program could truly promise that it would not racegetenvandsetenv, which it needs to for correctness, the locks could just be removed since they could never contend. They look to be there just to ensure the resulting crash isn’t in the C library, and the Windows C library has equivalent locks.For this interface to work,
getenveither needs to copy the variable to a buffer of the caller’s lifetime’s choosing, or implement some kind offreeenvso that the function performs an allocate/copy and can be freed when it’s no longer needed. The alternative just becomes “nothing can usesetenvon any thread until all thegetenvbusiness logic is finished with its return values”, which has been shown to not work.Excellent point — the locks are really just a mechanism for avoiding libc bug reports by ensuring the crash happens in application code!
As long as
setenv()isn’t used to modify the variable you retrieved, your pointer is fine. However, check this out…From the macOS
getenv(3)man page:But if the buffer was allocated by the current process in a previous call to
setenv(), it doesn’t behave this way at all (see here):This is interesting. It appears that the
reallocbehavior was added in Libc-825.24 (compare to Libc-763.11). This corresponds to macOS 10.8. So in 10.7 and earlier, it did indeed leak everything, but starting in 10.8 it is happy to callrealloc()on any string it originally allocated through a call tosetenv(). The code also references UNIX03 here as though that’s the reason for the change. Interestingly, this change predates the addition of locking to these functions.It seems to me that either the manpage needs to be updated (it says that future versions may change the behavior to fix the leak, but this isn’t future behavior now it’s past/present behavior), or the
getenv()function should be updated to mark the returned string as unowned, thus allowing it to leak. I’m inclined to file a radar on this, though such longstanding behavior is not very likely to change.In Windows the C library is layered on top of the OS (it’s not the native interface.) The Win32 API exposes GetEnvironmentVariable and SetEnvironmentVariable, which use locks and copy the data into a caller’s buffer, so they’re threadsafe. Unfortunately it looks like implementing the C getenv function interface is inherently unsafe, because it returns a pointer without a corresponding free. The C library ends up with its own (different!) copy of the environment block, and returns pointers to it, not unlike POSIX. In the end though, they defined getenv_s which mirrors the Win32 semantics.
My advice to anyone writing for Windows though is to avoid the C library as much as possible. It’s a cross platform abstraction layer that hides system behavior and introduces its own quirks (in this case, a duplicate/divergent environment block.)
The environment on Windows is quite different to UNIX. On UNIX, environment variables are just another set of program arguments (the other two on modern systems are the explicit arguments and the ELF auxiliary argument vector). You get whatever the parent process gave you. Traditionally, the parent process was a shell, which would mutate the environment that it managed and give you a copy when it started you.
On Windows, the environment is some global state (per login session, if I remember correctly) and the get and set operations read and write this state. If a user modifies the environment, the next call to get that environment variable will return the new version. I can’t remember if there’s a way of getting a notification on change.
On OpenStep / macOS, most of these use cases were superseded by user defaults, which provide a set of layered dictionaries and a way of getting notifications when things change.
In Windows each process has its own environment block, and specifies the environment block a child process should have in CreateProcess, not unlike UNIX. There is a key difference in data structure (single block vs. array of strings) which is important for buffer lifetime because it controls which modifications can invalidate pointers. Programs like Explorer attempt to recreate a vanilla “login session” environment block from the registry when launching child processes with things like CreateEnvironmentBlock, but it’s not a centrally maintained piece of data. A regular call to SetEnvironmentVariable is only affecting the current process, not the session.
The WM_SETTINGCHANGE event, apparently.
I was curious, so I checked Go, and it puts a mutex around its Getenv/Setenv standard library functions. You could still screw it by doing it in a C library, but it would be tricky, since it caches the environment between calls to set.
I mean getenv is thread safe in the sense it doesn’t mutate state, but it’s not thread safe if setenv happens at the same time - yet setenv is the thing going wrong, but afaict getenv doesn’t have the ability to make an atomic copy of the environment list at the start :D
I have no idea how the other OS’s deal with it, in principle you could (should?) make get/setenv acquire a lock - especially given I cannot believe acquiring a lock is significantly expensive compared to the piles of string parsing they’re doing - but I think the environ global still hoses you :-/
And that’s how we ended up with a crash, caused by unsafe-free Rust code badly interacting with the use of libc elsewhere in the program.
I don’t understand why Rust has an API like this. It’s obvious that
set_varmutates global state. And when you have mutable global state and threads, you’ll have races.It should be
my_handle_to_mutable_data.set_var().I was going to say that this is a “static models vs. dynamic reality” problem, but it seems even simpler than that … the model is just done wrong here.
http://www.oilshell.org/blog/2022/03/backlog-arch.html#csv-json-html-tables-records-documents
(Likewise, whatever glibc does is reality, but in this case it’s obvious what it does. It mutates a global variable, which may be shared between threads.)
it’s unclear to me how this would fix things? The core problem is that, even if the Rust code decides to take a lock, any C code running concurrently may not, and thus it’s fundamentally impossible for any Rust-side changes to actually make this fully safe.
Missing coordination between Rust layer and C layer is not much important issue. The main problem is that even if you fix it with locks, it is a global state shared by all threads. So despite it will not crash like described in the article, it will not work correctly, because particular threads will compete and rewrite the global state each other – i.e. last configuration set by a random thread will apply to all other threads.
The env variables make sense as a mostly immutable data passed from parent process to child. It can be modified early at the program start (single thread, before anyone reads the env) or after
fork()beforeexec(). But not during program runtime, especially if the program is multi-threaded.Any C code that calls setenv() concurrently should also be deemed unsafe, according to Rust.
It is definitely a limitation of static types that they can only reason about a single language inside a process, while you my have 2 or more languages within a process.
But this is an obvious bug even in pure Rust code without unsafe – i.e. just 1 language.
The Rust bug is marking
set_varandremove_varfunctions as safe. The Rust fix is to mark them as unsafe. For stability reasons this requires a new edition, which only happens once ever three or so years. So there has only been a warning on theset_varpage until that happens.However, Rust obviously can’t fix other languages. Python, C, etc are on their own there. Arguably it’s libc who are ultimately responsible as they’re the ones managing the shared resource, albeit they’re constrained by the C and POSIX specifications).
Not just according to Rust: POSIX says “The setenv() function need not be thread-safe.” edited to add: There are also several scary caveats in POSIX about
environPOSIX doesn’t say it’s always unsafe because Solaris / Illumos have a thread-safe setenv() — at the cost of leaking memory because old versions of the environment cannot be freed safely. But on every other OS setenv() is unusable in multithreaded code.
C has getenv() but not setenv() so it has nothing to say on this matter.
Rust could maybe have a safe setenv() but it would have to manipulate its own environment, not the libc
environ, so getenv() (or direct access toenviron) in non-Rust libraries linked into the same program would not be accessing the same environment.This isn’t a limitation of static types, it’s a limitation of the anaemic C FFI that can’t model any worthwhile static properties of an API.
Sure, but I’m making a distinction between “thread safe” and “unsafe in Rust” and “unsafe” in the sense of causing undefined behavior
Those are all slightly different things
Reasoning across languages is a limitation of static types (you can’t), but as mentioned, I don’t think that’s the main issue here
You can, up to the common subset of the languages’ type systems. Many of the existing examples involve C so the common subset is very poor.
Maybe your point is that you can’t use type system features that are outside the common subset to reason across the language barrier. But it seems weird to me to imply that you can’t do any reasoning at all, or that this is a limitation of static typing in general rather than the particular type systems in play.
In this case there was reasoning across languages: the FFI call was type safe up to the limitations of C’s type system, which is what allowed Rust to call C directly. When the compiler is unable to make static guarantees then a foreign call looks more like an RPC interface with runtime checks and format translations. You see this kind of faff in the C APIs of dynamic languages.
There are a couple of things that make it hard to have FFI with richer type systems:
Languages with sophisticated type systems tend to have sophisticated runtime systems. You can’t have multiple runtimes in the same process because a runtime generally assumes it is responsible for basically all the process’s memory and threads.
So languages on the JVM and the CLR have more sophisticated cross-language object-oriented static type systems, but there are fewer examples in the functional programming tradition where the runtime isn’t typically exposed as an independent layer.
There aren’t (yet) very many languages with minimal runtimes and rich type systems designed for programming in the large. A lot of the ones that aren’t Rust are designed for stronger verification of a component of a larger C or C++ program so they don’t try to expose fancy types.
Maybe in the future there will be more variety in this area, and a greater desire for these languages to call each other across more richly typed interfaces.
I don’t follow … what I’m saying is that the common subset of any 2 type systems is the empty set.
That’s why you need runtime checks in the glue between them.
e.g. an
i32in Rust is not the same thing as anint32_tin C++ – it doesn’t have the same valid operationsAnd that follows for the rest of the type system too:
And you can replace Rust with OCaml, and C++ with Swift or Zig, and that’s all still true
Maybe you are saying, well
i32is really the same asint32_t. I am not sure that is true – but maybe if you reference the ABI, which is separate from the language. And even C and C++ don’t have compatible ABIs – hence the existence ofextern "C"But let’s suppose for a minute that it’s true for primitives … As soon as you get to even “struct”, the languages don’t have any obvious mapping between them (most languages do have special cases for C, including Rust, but not for say Zig).
The binding tools all have extra-linguistic conventions, and there are very often holes in those conventions (including say locking).
To make this a little more pithy, I’d say
So those are two special cases, where one language explicitly has support for another.
But, in general, the common subset of any other 2 type systems is zero / the empty set.
The way to get a common subset is to purposely implement the static semantics of C++ with Rust syntax, and conversely the static semantics of Rust with C++ syntax
Which is what Safe C++ is about - https://safecpp.org/draft.html
You can clearly see this difference in how the fish shell migrated from C++ to Rust - https://news.ycombinator.com/item?id=42535217 - i.e. read about their experiences with the binding generators, and also note that fish was never released as a hybrid C++ / Rust program (although there is more than one reason for that)
And I realize this is far afield from the original discussion, but what I’m saying that is that the type system helps you within one language, not across languages. The two languages interact dynamically, not statically, e.g. by calling setenv() concurrently.
Yes but this is safe in pure Rust, because it does already take a lock. All the surprise cases have been from mixing in C, because even the libc calls that Rust does can end up doing getenv.
(In retrospect I was a bit unclear in my previous comment: “even if the Rust decides to take a lock” was meant to be like “Rust taking a lock is irrelevant”, but it actually sounds like “Rust doesn’t currently take a lock” which is just wrong.)
[Comment removed by author]
It is not about Rust, it is not about Assembly, nor ARM64 vs. RISC-V…
Just:
n.b.
MT-UnsafeEnvironment variables are a simple interface between the parent process (the environment) and the program being executed. Not a dynamic global thread-safe map.
Note that this is only true on the platform where you obtained that manual page. The people who wrote the implementation have made a choice not to make the interface safe.
If you look at https://illumos.org/man/3C/setenv ours is MT-Safe, and there’s really no reason that every other libc couldn’t make the same decision.
Agree, thanks.
BTW: Does Illumos have independent env for threads or shared by whole process (and just thread-safe)? It might be useful or not… However at current state (env is usually not only shared but even MT-Unsafe), it is just a bad design if someone recklessly modifies env in a multi-threaded program or if a library can be parametrized only using env and does not provide any other per-thread configuration (like functions or thread-local variables).
The environment is definitely per process, not per thread. FWIW, I don’t think setenv(3C) is a good interface, and I don’t think people should use it (there are vastly preferable alternatives for every legitimate use case in 2025), I just don’t think it should be unnecessarily thread/memory unsafe!
Here’s the source https://github.com/illumos/illumos-gate/blob/master/usr/src/lib/libc/port/gen/getenv.c
What’s the preferable alternative for “i’m using a library which changes behavior based on environment variables and i need to control that behavior”? Note that libc itself is a library that does this (e.g. the
TZenv var).I think in general the environment variables are intended to allow control from outside the process; this applies to the ambient timezone and locale stuff especially. If you override them within the process you’re preventing those mechanisms from working as designed. In general, any library interface that can only be configured through environment is not a good interface.
In the case of locales, the newlocale(3C) routine was added to allow programs to look up a specific locale and get a handle to it, rather than use the ambient locale from LANG and LC_ALL in the environment. Probably we should be looking to add a newtimezone(3C) routine to allow a similar thing with timezones!
I’m guessing MUSL’s versions aren’t thread-safe either https://git.musl-libc.org/cgit/musl/tree/src/env/setenv.c
I wish
LD_PRELOADfacilitated shim libraries without the library essentially having to become very linker-aware if they want to call the otherwise-linked-by-default implementation. e.g. in CI environments we could load dynamic linting libraries that complain loudly if a process callsgetenvaftersetenv. This isn’t the only case where ancient interfaces are inherently dangerous and unlikely to be fixed soon.