This starts getting a bit rough when you’re using libraries that are relying on stuff like locale.
I don’t know if the solution is to edit the environment, but the fact that so many C libraries change behavior based on env bubbles up in so many places. Bit of a rough foundation
I treat the environment as immutable, but often call sub-processes with altered/augmented environments, e.g.
#!/usr/bin/env bash
doSomethingWith "$FOO" "$BAR"
FOO="A different foo" someOtherThing
For LOCALE in particular, HaskeIl (compiled/interpreted with GHC, at least) will crash on non-ascii data if LOCALE isn’t set to an appropriate value, so I often invoke Haskell programs with LOCALE="en_US.UTF-8" myHaskellProgram.
I run into this a lot in build scripts (e.g. using Nix), since their environments are given explicitly, to aid reproducibility (rather than, say, inheriting such things from some ambient user or system config).
I imagine this would be extra painful if using libraries which need conflicting env vars.
Racket handles this quite nicely using dynamic binding (AKA “parameters”), which we can override for the evaluation of particular expressions. It feels very much like providing an overridden env to a sub-process. For example, I wrote this as part of a recent project (my first Racket project, actually):
;; Run BODY with VARS present in the environment variables
(define (parameterize-env vars body)
(let* ([old-env (environment-variables-copy (current-environment-variables))]
[new-env (foldl (lambda (nv env)
(environment-variables-set! env (first nv)
(second nv))
env)
old-env
vars)])
(parameterize ([current-environment-variables new-env])
(body))))
Looking at it now, it might have been nicer as a macro, like (with-env (("FOO" "new foo value")) body1 body2 ...).
Locale- defined behaviour is indeed an ugly little duckling that will never become a swan. One of the weird bugs I’ve been called into over the years was a C+Lua VM + script that was outputting CSV with “,” as element separator in a longer processing chain (system was quite big, simplified for the sake of story).
The dev had been observant and actually checked the radix point and locale before relying on its behaviour in printf- related functions. Someone had linked in another library though, that indirectly called some X+Dbus nonsense, received language settings that way, and changed the locale mid-stream - asynchronously. Sometimes the workload was finished, sometimes a few gigabytes had gone by and corruption was a fact as floats turned into a,b rather than a.b after a while…
NetBSD has a getenv_r function that uses a lock. The same lock __readlockenv is also acquired and released in getenv, system, and popen. The system and popen calls do not check for failure to acquire the lock. I’m not sure whether this is correct.
This is sound advice. Consider the environment immutable and your sanity will be preserved.
This starts getting a bit rough when you’re using libraries that are relying on stuff like locale.
I don’t know if the solution is to edit the environment, but the fact that so many C libraries change behavior based on env bubbles up in so many places. Bit of a rough foundation
I treat the environment as immutable, but often call sub-processes with altered/augmented environments, e.g.
For
LOCALEin particular, HaskeIl (compiled/interpreted with GHC, at least) will crash on non-ascii data ifLOCALEisn’t set to an appropriate value, so I often invoke Haskell programs withLOCALE="en_US.UTF-8" myHaskellProgram.I run into this a lot in build scripts (e.g. using Nix), since their environments are given explicitly, to aid reproducibility (rather than, say, inheriting such things from some ambient user or system config).
I imagine this would be extra painful if using libraries which need conflicting env vars.
Racket handles this quite nicely using dynamic binding (AKA “parameters”), which we can override for the evaluation of particular expressions. It feels very much like providing an overridden env to a sub-process. For example, I wrote this as part of a recent project (my first Racket project, actually):
Looking at it now, it might have been nicer as a macro, like
(with-env (("FOO" "new foo value")) body1 body2 ...).Locale- defined behaviour is indeed an ugly little duckling that will never become a swan. One of the weird bugs I’ve been called into over the years was a C+Lua VM + script that was outputting CSV with “,” as element separator in a longer processing chain (system was quite big, simplified for the sake of story).
The dev had been observant and actually checked the radix point and locale before relying on its behaviour in printf- related functions. Someone had linked in another library though, that indirectly called some X+Dbus nonsense, received language settings that way, and changed the locale mid-stream - asynchronously. Sometimes the workload was finished, sometimes a few gigabytes had gone by and corruption was a fact as floats turned into a,b rather than a.b after a while…
NetBSD has a getenv_r function that uses a lock. The same lock __readlockenv is also acquired and released in getenv, system, and popen. The system and popen calls do not check for failure to acquire the lock. I’m not sure whether this is correct.
From the linked article:
Isn’t this just a BSD vs. SysV thing?