Since this is an application, and the problem being solved is coming from outside (somebody killing the main process) or is a bug (crash, otherwise exiting without cleaning up its children), you might consider letting systemd (or something similar) do the heavy lifting. It will put all the processes into a cgroup (not a pid namespace), and when the main process exits for whatever reason systemd will take everything out in the cgroup, which is possible because you can enumerate all the processes in a cgroup.
You can use systemd-run --user to do this in an ad hoc way.
@MaskRay has a good overview of a lot of the mechanics of TLS which I found very handy for implementing a custom debugger. Opting into specific models can get you out of the business of __tls_get_addr even for SOs but there are tradeoffs.
Musl makes different choices than glibc for how the TLS is allocated that is closer to the authors “shimming pthread_create” out of the box
You misunderstood. The first project started in like 2005, the second in 2014. They both have over a decade of use and both continue to be used. Neither is gonna be renamed.
No. I didn’t misunderstand. Yes, both have continued to exist largely without problems, but the fact that we have two, independent, package managers named “asdf” is just stupid. In 2014, a quick google search for “asdf” would have easily found asdf for Common Lisp. Why’d we get here in the first place?
If it wasn’t another package manager, I’d be more inclined to not care. But, come on.
And yes, the ship has sailed for asdf — but my Lua will be a general - general scripting language! It’s not just a general embeddable scripting language… No. It takes scripting languages to a whole new level of general-ness!
edit to say: 2005 actually seemed too late. Turns out asdf is 24 years old.
It’s true that asdf for Common Lisp is not a package manager — it’s a build tool that… creates packages.
The other asdf swaps out “runtimes” … a so called “version manager” allowing you to ensure that the right set of tools are being used when you go to develop, build, etc. It can install runtimes, and it can remove runtimes. I mean, I don’t want to go against the wishes of the author… but if it looks like a duck…
But, I’ll choose to concede to your point that neither of these are de facto “package managers,” but it’s hard to argue against the fact that they are in the same general purpose space.
edit to say: 2005 actually seemed too late. Turns out asdf is 24 years old.
Yes, I remember circa 2008 or 2010 when Faré revamped it as asdf2. But what’s the point? So there’s a namespace collision across two systems. No Lisper is going to mistake it.
I will not claim to be a Lisper, but having used neither in anger I was for a time under the vague misapprehension that there was only one asdf and that the Lisp one could be (ab)used to install non-Lisp things, kind of like pip sometimes is.
I fully admit this was an entirely a me problem and if I ever tried to use either my confusion would have been resolved, but there is my sample of 1 on the confusion being possible.
The point is that if you’re starting a new project, maybe do a quick search to see if something else by the same name is already established? This isn’t hard. It’s common courtesy. I don’t see why it’s controversial?!
For kicks I checked Wikipedia: asdf has even had a Wikipedia page since 2007.
Sure, I agree and would do the same, but someone didn’t and they made free software and put it out there for other people to use and continued to maintain it. I can’t get my nerd rage on about the accidental reuse of a project name.
Wait until you find out that Java was named after a coffee which was named after an island. Or that Python was named after a comedy skit which was named after a snake. And don’t get me started on Go or Rust or Ruby! It’s a miracle that we can get anything done in the software world with all of this semantic ambiguity! May we discuss nothing else except this endlessly fascinating topic until the end of time! /s
Wait until you find out that Java was named after a coffee which was named after an island. Or that Python was named after a comedy skit which was named after a snake. And don’t get me started on Go or Rust or Ruby! It’s a miracle that we can get anything done in the software world with all of this semantic ambiguity! May we discuss nothing else except this endlessly fascinating topic until the end of time! /s
You really got me… I’ll give you another, the function car in Lisps always confuses me with the car that you drive on the road. Surely we must move to unique identifiers every time we need to name something! /s
Or, you know, you can just stop naming the two things in the same category of things the same name. That might work. The lack of common decency is crazy.
Or, you know, you can just stop naming the two things in the same category of things the same name. That might work. The lack of common decency is crazy.
Obviously taxonomies are arbitrary and subjective, and a runtime version manager is not obviously in the same category as lisp build tooling. You could argue that they are both in the category of “software” just like you could argue that Rust (programming language) and Rust (game), yet people on this forum don’t routinely drag that up in every post with the rust tag because there’s obviously no such moral prohibition against using pedestrian names across multiple software projects.
A classic, but I’ve always felt that it misunderstands the benefits of Java’s very restricted semantics: discoverability.
If you’re looking for some functionality in a Java library you’re unfamiliar with, you’re looking for… a class. Always a class, because, in Java, everything belongs to a class. And, because all method calls are written in SVO order (subject.verb(object)), any method you’re looking for can be autocompleted by the IDE. Every call involves typing a dot, which gives your IDE a place to autocomplete. This often makes libraries understandable simply by reading the Javadoc or even reading through the menus in your IDE.
I agree that in principle, subject/verb/object seems like a nice model.
Where I struggle to accept this is in realizing that
sometimes you want to be generic over the subject, sometimes over the verb, and sometimes over the object
sometimes you have multiple subjects for one verb and/or object
Trying to fit everything into the subject/verb/object model makes some of these other valuable things harder to do. (To be concrete, first-class functions are extraordinarily useful, and older versions of Java made that pattern miserable to achieve.)
Ironically this is the argument people typically make for Go: yes, there’s a lot of tedious and repetitive boilerplate, and a lot of stuff the language can’t do by design or refuses to do by fiat, but it’s all in the name of keeping it “simple” for programmers and making all Go code everywhere look and feel similar.
(I am not sure I buy that argument, but it is the same argument)
I agree, it does help with discoverability, but uh, that’s because every single class has a unique and special API to it that you must figure out in order to interact with it. All that data is “encapsulated”, and now it’s not possible to get at it.
Every single line of code you write is gonna’ be special.
So the great big claim of OO / Java reusability? Gone. We know it’s gone, because we realized inheritance is mostly a trap, so that “reuse” went out the door long ago now…
So about discoverability, and the need for it… well, that’s a solution a problem you shouldn’t have, really. If you use a “Data-Oriented Programming” language or approach and have a bunch of powerful functions to operate on a few simple common data structures (your “interfaces”), then you learn those powerful functions and compose with them, and there’s nothing left to discover. You’ve got your toolbox, ready to use on everything.
It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures.
There are tradeoffs in all languages and data representations. The downside to the “everything is one of a few basic data structures” approach you see in a lot of dynamic languages is that now each library’s data representation is some unique, incompletely-documented, implementation-specific arrangement of these data structures. If you have a lot of examples to learn from, this is convenient, but what if you don’t?
The nice thing about strongly-typed OO languages is that the types and method signatures are often enough to tell you how to work with a given library.
(This isn’t to say that OO is strictly better, every paradigm has its pros and cons. But I find that people often discount the pros that do exist in unfashionable languages and paradigms.)
There are always trade-offs, sure, but in practice, most dynamic languages do have classes of some sort, they just tend to be more open.
In the case of JavaScript and Lua, you have constructors. You usually start with data emitted from a constructor, and then work with more functions or methods. For documentation, you have JSDoc, LuaDoc, and similar to provide you with explanations of inputs. Those still exist in Java and are necessary, as the type system cannot encode everything. In simpler terms, modern JS function signatures include destructuring, and that destructuring indicates what input is expected, I.e. updatePassword({password}, newPassword) is enough to tell you that this function depends on an object that has a password property, and can work on many objects. It’s basically an interface declaration.
In the case of lisps like Clojure, it’s even better, because you get the same as above, but also gain a REPL to interrogate your functions right inside your editor, in your codebase, and thus you can play with them immediately and safely because 99% of functions are pure. Since the functions do operate on those simple common data structures, you’re not going to have to mock/instantiate 3+ different classes, configure them to the state you need, pass them to the constructor of the method you’re testing, call a bunch more methods to set up the internal state and extra fields, etc, just to be able to test how a single method works. You just go and test a function directly.
So the great big claim of OO / Java reusability? Gone. We know it’s gone, because we realized inheritance is mostly a trap, so that “reuse” went out the door long ago now
OO (at least Java) composition doesn’t really work when the model is fundamentally about inaccessible mutable state via “encapsulation”. You need to get at that state, and to do that, you’re going to need to override or wrap a class, or write something that extracts that data for this specific class, and that specific class.
“Encapsulation” is antithetical to composition.
(This is why prototype-based OO ultimately wins at composition (at least in OO land), especially if it’s not encapsulated, like in JavaScript and Lua.)
Creating processes is very fast on Linux and most other Unix-like systems.
Oh wow, I always thought that process creation is relatively fast, but still, say, an order of magnitude slower than running your typical sans-io test. This is a big update for me, thanks! I am curious about two things here:
is this fork+exec or just fork? As in, do we try to cache most of process address-space and fork just to select a specific test to run, or does it exec the entire thing with dynamic loader, ASLR, and stuff?
what’s the ballpark CPU overhead if we restrict outselves to fast tests only? It is clear that, for realistic test suites, you can amortize a lot of overhead as there’s going to be a couple of tests that run for the whole minute, and the speed is in running those two tests in parallel first thing. Still, if I have a large corpus of uniform small tests, where each test allocates a little, computes a little, but doesn’t talk to external processes and doesn’t do combinatorial explorations, would I observe perceptible speed difference due to process-per-test model? My current gut feeling is that it’s going to be, say, 5x difference in CPU time, but it sounds like I should update that downwards?
is this fork+exec or just fork? As in, do we try to cache most of process address-space and fork just to select a specific test to run, or does it exec the entire thing with dynamic loader, ASLR, and stuff?
Great question! We actually use posix_spawn on almost every Unix platform, which tends to be faster than fork/exec. Just using fork is definitely out of the question here – it’s not really compatible with most Rust code.
what’s the ballpark CPU overhead if we restrict outselves to fast tests only?
The clap repo is a great one for the limiting case of tons of extremely small tests. On my Linux desktop, against fc55ad08ca438e1d44b48b9a575e9665bc0362de:
$ cargo nextest run --test builder
Summary [ 0.148s] 857 tests run: 857 passed, 0 skipped
$ cargo test --test builder
test result: ok. 857 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s
So, to be clear, process creation is still quite a bit slower than doing stuff purely in-process. But it is very fast compared to what the vast majority of tests do when you weight by the time each test takes (which is what matters for end-to-end performance), and it has all of the other advantages outlined in the post. Practically speaking, the ability to run tests across multiple binaries simultaneously has maybe 100x the impact to performance that this does.
(edit: I’m aware of your very helpful advice to coalesce integration test binaries, and follow it regularly! In large workspaces though you’ll often end up with at least one integration test binary per crate.)
I’m really curious about the double-spawn. How exactly does a child process end up “half born” such that the parent never returns? Either the new process was created or it wasn’t, so what is the parent waiting for? And what makes this a flaw in the specification rather than a bug in glibc’s implementation or Linux’s syscalls? Does this happen on darwin too?
Also, if the fix is to mask out SIGTSTP up front and then restore it in the child, what happens if you just go ahead and do that by using POSIX_SPAWN_SETSIGDEF to reset SIGTSTP?
I think I am confused. If I hit Ctrl-Z can’t I expect the parent / cli-tool to be stopped anyway? Isn’t this intrinsically racey just as a mechanism, I can’t time my ctrl-z
to get the “right” process even without this situation no?
If I understand correctly the situation is basically equivalent to
if (vfork() == 0) { raise(SIGTSTP); exec(…); }
which yeah, the parent is suspended until the child execs but that’s kind of just how vfork works, and can be an issue for any blocking or long run in operation in the child.
If posix_spawn was a syscall and not emulated with vfork this would presumably be fine for some definition of fine, but it still is just a question of
// signal delivered here
posix_spawn // signal delivered here
// signal delivered here
EDIT: Ok I think I can imagine what the bad experience is:
Signals sent to the parent arrive after the child releases the
parent’s memory (i.e., after the child terminates or calls
execve(2)).
The point at which the child process gets stuck is before a new pid is assigned to it. I’m not kidding when I say half-born.
Because of the parent being in uninterruptible sleep, the terminal appears completely hung. Ctrl-C etc don’t work. The only option is to go to a different terminal and type on kill -CONT , which is non-obvious and not reasonable to ask users to figure out.
Again, this deserves a long and detailed explanation which I need to write up at some point.
Thanks for the additional details, I can reproduce with something simple like:
#include <unistd.h>
int main() {
if (vfork() == 0) sleep(1);
}
This suggests an alternative workaround, rather than masking SIGTSTP, and using a helper to unmask it, install a noop handler for it in the parent. Exec will remove the handler in the child and you can remove the handler in the parent when posix spawn returns.
Thanks. I think relying on exec resetting the signal handler would definitely work in single-threaded contexts, where it’s okay to poke at the global signal handler like that. (But nextest uses Tokio’s multithreaded runtime.) One advantage of signal masks here is that they’re per-thread.
From this discussion, I assume the solution of resetting the signal mask using the posix_spawn attrs would also be vulnerable, as the child would reset the mask prior to calling exec (and therefore immediately receive the signal)?
This definitely sounds like a bug, but it doesn’t sound like a fundamental flaw of the posix_spawn specification, just of the way Linux has chosen to implement this. For example, if posix_spawn was a syscall on its own that would work, or if Linux added a mechanism to reset the signal mask upon exec then it could just mask out SIGSTOP itself across the clone(2) call.
I’m still confused at the “before a new pid is assigned to it” part though. How can a process exist without a pid? It’s getting stuck in clone, but that doesn’t mean the child doesn’t have a pid, just that the parent doesn’t know the pid yet.
From this discussion, I assume the solution of resetting the signal mask using the posix_spawn attrs would also be vulnerable, as the child would reset the mask prior to calling exec (and therefore immediately receive the signal)?
No, this doesn’t happen because the child inherits the parent’s signal mask (I changed Rust to do this in late 2022). The important thing with double-spawn is that the reset happens in the intermediate process, after the posix_spawn returns in the parent.
This definitely sounds like a bug, but it doesn’t sound like a fundamental flaw of the posix_spawn specification, just of the way Linux has chosen to implement this.
Ah, with a second read I think the issue is that the spec is too flexible – it for implementations that are definitely incorrect.
set the signal mask before the exec (“If the POSIX_SPAWN_SETSIGMASK flag is set in the spawn-flags attribute of the object referenced by attrp, the child process shall initially have the signal mask specified in the spawn-sigmask attribute of the object referenced by attrp.”)
return an error if the exec failed
This means that for a library implementation of posix_spawn based on a vfork-like thing:
at the time the vfork happened, the process has all signals blocked (good)
then, the signal mask is set to that of the parent (this is the point at which SIGTSTP is delivered)
then, exec is called
then, posix_spawn returns
There’s no place anywhere to set the signal mask after the exec but before the posix_spawn returns.
Originally, I thought that the requirement for error handling meant that exec must be called before posix_spawn returns, but on a second read the documentation says that error 127 may be returned for async handling instead.
Solutions that would resolve this are something like:
there is an atomic “exec-and-set-signal-mask-to-something” operation
there’s two levels of signal masks, one that sigmask and pthread_sigmask interact with, and a coarser sigmask that gets reset on exec.
posix_spawn returns before the signal mask is unset, and something is made to do the exec asynchronously
Based on this, as well as the glibc mailing list conversation, that the posix_spawn spec is underspecified and allows for some clearly incorrect behavior. For example, I think it should clearly say that signals MUST only be delivered to the child process after posix_spawn returns.
I’m still confused at the “before a new pid is assigned to it” part though. How can a process exist without a pid? It’s getting stuck in clone, but that doesn’t mean the child doesn’t have a pid, just that the parent doesn’t know the pid yet.
Sorry, a new pid does get assigned to it. I was thinking of the command line which hasn’t changed yet since this is really early in the process.
I just repro’d it in this gist, with instructions included. You can see the new process stuck with a new pid but the same command line and virtual mem size (i.e. it’s a vfork of the parent process before exec is called).
Ultimately the problem with stopping the child is the parent is technically not stopped, just in uninterruptible sleep, and so the shell does not resume control of the terminal, right? If the shell were to resume terminal control anyway and allow you to type fg then things would be ok?
This suggests another solution, which is if a process is in uninterruptible sleep due to vfork, and the child process is stopped, the parent should report its status as though it’s stopped too (and its parent should get a SIGCHLD). Which is to say, the notion of “this process is stopped” should be part of the state shared by the vfork (and sending a SIGCONT to the parent should work to wake up the child).
That said, I’m inclined to prefer solutions that simply make posix_spawn appear atomic, since I don’t know what other uses there are for vfork besides implementing behavior like posix_spawn. So a solution of the form “defer stops until after the child execs” is preferable.
Ultimately the problem with stopping the child is the parent is technically not stopped, just in uninterruptible sleep, and so the shell does not resume control of the terminal, right? If the shell were to resume terminal control anyway and allow you to type fg then things would be ok?
For normal process spawns yeah that would work. Nextest puts each test in a process group, but also ensures that stop and continue signals are forwarded — so it would also work.
I agree that stops (and maybe most/all other signals) should just be deferred until the child execs.
This seems to be real bug in glibc or Linux. Please, report it to glibc bug tracker. With proper reproduction steps, i. e. with some small C program, which reproduces the issue. And post link here (because the bug is interesting for me). Even if the bug turns out to be unfixable, having proper reproduction steps is still valuable. If you have difficulties with creating glibc bugtracker account, post reproduction steps here, I will forward them to bugtracker
But the fact that I’d have to keep the double-spawn workaround on Linux for at least a decade or so, not to mention any other platforms which may have the same issue, made me kind of move on haha :)
I want to reproduce the issue and to understand what is going on! The issue seems interesting and I want to understand it!
“just run nextest on clap” will definitely help me reproduce the issue, but will not help me understand it at all. I want to understand what you trying to do, what exactly causes the process to get to uninterruptable state.
“just run nextest on clap” means that in order to understand the bug I need to remove code from nextest codebase until it becomes 20-30 lines, which is core of the issue. This will probably take me many hours or even days.
I suspect that this is the reason why glibc devs don’t want to dig further
But this doesn’t actually use posix_spawn, just vfork, and is more of an indication of the problem rather than providing deep insight into the problem itself. A more complex reproduction would involve posix_spawning tons of short-lived processes rapidly (you don’t need to do anything special around signal masks) and hitting ctrl-z — it may take a few tries to repro, and also using many threads might make it repro faster.
With nextest, the race window is slightly smaller because it creates a process group for each test (so the window is between signal unblocking and setting the process group, not signal unblocking and exec). But this simpler example may help guide you.
The stack trace in my email to the list may also help. I obtained it via gdb on the stuck process, then manually traced through the glibc code to surmise what was happening.
I’ll be honest, it would be quite low priority for me to work on a full reproduction — I have limited time and many other commitments at the moment. But you’re welcome to take it from here!
Okay, using that link I was able to reproduce something. I compiled this program:
#include <unistd.h>
int main() {
if (vfork() == 0) sleep(1);
}
Then I started the program and pressed Ctrl-Z. Nothing happened. And the program became totally unresponsive. I was unable to kill it any way (Ctrl-Z, Ctrl-C, etc). The only thing that worked was killing it using kill -9 ... from another terminal window.
Is this bug you are talking about?
Even if answer is “yes”, these reproduction steps are still not perfect. “vfork” manpage disallows calling “sleep” in child created by “vfork”. The only allowed functions are “_exit” and “exec*”. So (I think) glibc and Linux devs will simply answer “you are not allowed to call sleep after vfork, so you are on your own”.
Are there some better reproduction steps?
Then I tried to reproduce using posix_spawn. I wrote this code:
I run that program, tried to stop it using Ctrl-Z several times, and… I saw no bugs. Every time I stopped the process group, I was able to successfully resume it using “fg”. Behavior was totally expected.
So, yes, this time I was unable to reproduce anything.
My program features 3 magical constants: count of interations (1000), time of execution of given process (0.1 second) and sleep between runs (10 000 usecs). Maybe I need to tweak them somehow?
I say all this, because I’m genuinely curious. And I think nothing will be done until we submit proper bug report with proper reproduction steps to glibc or Linux bug tracker. Then hopefully glibc/Linux devs will start work on this.
Of course, if you don’t have time, this is totally understandable. Feel free to ignore this comment if you want.
P. S. I’m writing all this from my new Dell Precision 7780 with ECC memory. I bought it for various reasons, but one of the reasons was this: I have read your article on ECC memory ( https://sunshowers.io/posts/am5-ryzen-7000-ecc-ram/ ). (Another reason was Torvalds’ opinion on ECC.) After buying this laptop one of the first thing I did was running commands from your article to verify that this laptop actually has ECC memory. And it seems it have. Thanks a lot!
Yes, vfork by itself gives a sense of what the problem feels like, but doesn’t indicate the problem itself.
If shorten the process down to absolutely nothing, e.g. something like “true” and putting in no or a sub 1ms delay — just let process creation rip. If that doesn’t work, I’d try doing it across several threads.
If I run this program and press Ctrl-C, then program terminates cleanly. But if I run it and press Ctrl-Z, nothing happens. I. e. the program doesn’t stop. And then I’m unable to kill it using Ctrl-C or even by killall b. The only thing that works is killall -9 b. Is this bug you are talking about? Or I found some different bug?
I just did some more experiments and it seems that the bug is reproducible with Linux 6.1, but not reproducible with Linux 6.12. I will do more experiments and will report my findings in this thread
Hmm, strange – I just managed to repro it locally with Linux sunshowers-linux 6.12.8-201.bazzite.fc41.x86_64, inside an Ubuntu 24.04 distrobox and with NEXTEST_DOUBLE_SPAWN=0 cargo nextest run on https://github.com/clap-rs/clap.
Just out of curiosity, could you use clone() + exec() directly instead of the double fork to work around this? If I’m reading this correctly, it’s mostly the fault of __spawni using CLONE_VFORK, which explicitly puts the parent in an uninterruptible sleep, so I’d imagine that DIY-ing to not use that flag would get around the issue without needing the double exec.
(On the other hand, this is quite a bit of extra system-dependent code if you’re using clone() directly, and I don’t exactly recall if a vanilla fork() is slower due to passing extra flags internally…
It’s been a while, but I think I had imagined that clone-without-vfork would have the same performance profile as fork/exec, which is much worse than double-spawn. But it’s not something I verified.
Double-spawn meant I could use posix_spawn via the Rust standard library, relying on std::process::Command.
I’m aware of your very helpful advice to coalesce integration test binaries, and follow it regularly! In large workspaces though you’ll often end up with at least one integration test binary per crate
Yup! The galaxy-brain version of that advice is what prenosco folks are doing — use one integration tests crate for the entire workspace, so that there’s just a single binary for all the tests! This also avoids each crate being compiled with and without test cfg.
There are some real issues with that in larger workspaces with many output artifacts — you have to wait too long for the build to happen in unrelated parts of the codebase.
Would Mypy have been of any usage with some typestubs? E.g., that qb was meant to be of Type[SomeQueryBuilderInterface] which shouldn’t have an engine field?
I was thinking the same: Any statically typed language would disallow modifying types at runtime. By definition. But does Mypy care? Let’s try:
class QueryBuilder:
pass
class QueryBuilderMysqlEngine:
pass
def main():
qb = QueryBuilder()
qb.engine = QueryBuilderMysqlEngine()
[ ] mypy: Success: no issues found in 1 source file
[ ] ruff:
[x] pylint | grep -Ev 'docstring|snake_case|Too few public methods': Attribute ‘engine’ defined outside __init__ (attribute-defined-outside-init)
Thankfully, pylint cares, but it also cares about things I find rather normal.
My takeaway from the article: As someone not used to thinking in dynamic types and concurrency in the same world: Oh boy, this is also a thing to check for.
This is because mypy doesn’t check functions that don’t have type annotations on them by default. You can change that via various flags, which I think should be included in the --strict mentioned in a sibling comment.
I was about to ask the same. Maybe some would count casting and polymorphism, but I wouldn’t.
That’s a question of definition, but my answer is that you can’t change types at runtime, because types don’t exist at runtime. Wikipedia is foggy as usual on what static typing precisely is, but I think the meaningful distinction is that the type of each variable is static. That’s what it must mean if C++ is statically typed. In other words, it can’t be relevant that you can violate typing as much as you want at runtime by pointing at or interposing objects of different types, and that the language even helps you do that in safe forms (OOP).
I don’t think this is an accurate representation of the original code. This is constructing a new object of type QueryBuilder. If a new object was being constructed each time then there would not have been an issue. But the code was doing the equivalent of qb = QueryBuilder, not qb = QueryBuilder(). So instead of creating a new object and then mutating that new object, we’re instead mutating the QueryBuilder class itself.
error: "type[QueryBuilder]" has no attribute "engine" [attr-defined]
which is what we want to happen.
You need either the -> None annotation or the --strict command line option to get mypy to kick in here. Otherwise it will treat the entire main function as something that’s not yet meant to have type checking applied.
Probably, but whoever wrote it probably wrote it this way intentionally. I don’t know why, but I am guessing it was because qb.engine was a better developer-facing API than qb_engine separately attached to the local namespace. Alas, better design was to not have this “global state” at all, which we eventually removed.
Yeah I dived a little deeper/dusted off the cobwebs and stroll uses fsatrace which is what shake uses and has overlapping authors with shake. Shake’s capabilities like this were apparently inspired by fabricate
Now this is some pretty good proof that you can do systems programming with a high level garbage collected language. A pure Scheme PID 1, how cool is that!
You can do it. The problem from my perspective is that it’s hard to make sure it’s allocation safe. In some languages/implementations (and guile probably fits this description) it’s pretty hard to tell when a memory allocation is even going to happen, so guarding against it failing is particularly difficult, if it’s even possible at all.
Now the argument may be made that memory allocation failure can be handled by simply terminating the process, and that’s perhaps fine in some circumstances (I personally think it’s still open to debate, and for systems software applies even less than it does for application software, but anyway) but if the process in question is init, terminating it brings the whole system down (forcibly: on Linux at least, it causes a kernel panic).
(So basically if you’re happy with an out-of-memory situation causing the system to crash, then it’s ok. Maybe you never see those anyway. There are also ways to limit its likelihood: you can use cgroups to prevent other processes from consuming all the memory, but it’s tricky to get right and it’s hard to know how much memory you have to reserve for your init. For an important server, you’d probably rather use an init that won’t cause a panic if it can’t complete a memory allocation. It’s still a pretty bad situation, but it might not be as bad as bringing the machine down).
Can you explain more how this handling of out of memory should be done?
My understanding is that in most Linux distros malloc will never fail, since the OS just happily over commits virtual memory. Can you switch that off for PID 1?
With overcommit on, there are still some ways that malloc can fail even with overcommit though they are usually unlikely to be encountered, it’s true. There’s no actual switch, but on a critical system you really don’t want overcommit enabled, because memory can still be exhausted and something’s got to give - in practice what happens is that the kernel (or a userspace helper) have to decide on a process to kill in order to free up memory, so you’re back to square one (i..e it’s basically the same as the process terminating when malloc fails, although hopefully the OOM-killer doesn’t target PID 1).
although hopefully the OOM-killer doesn’t target PID 1
Let’s find out. Against my better half’s wishes (who Googled it and found it immediately themselves), I’m going to try asking a local LLM (ht @simonw; asking Llama 3.3). It possibly semi-hallucinated part of the code, but it might just be older code. It did point me to the exact file and give me the right names for things:
>>> Does the Linux OOM killer stop itself from killing the init process (pid 1)? If so, where in the source can I find this logic?
A question that gets to the heart of Linux kernel internals!
Yes, the Out-of-Memory (OOM) killer in Linux does indeed prevent itself from killing the init process (pid 1). This is a deliberate design choice to ensure system stability.
The reason for this exemption is that the init process is responsible for managing the system’s initialization and shutdown sequences. Killing init would likely lead to a system crash or become unresponsive, making it difficult or impossible to recover from an OOM situation.
Now, let’s dive into the Linux kernel source code to find where this logic is implemented.
The relevant code can be found in the oom_kill_process function in mm/oom_kill.c. Specifically, you’re looking for the check that excludes pid 1 from being killed:
if (pid == 1) {
/* Don't kill init */
return 0;
}
This check is located around line 350-360 in the Linux kernel source code (depending on the version). You can find it by searching for pid == 1 in mm/oom_kill.c.
In newer Linux kernel versions (from 4.12 onwards), this logic has been refactored into a separate function called oom_unkillable_task, which is defined in include/linux/oom.h. This function takes a task structure as an argument and returns a boolean indicating whether the task should be exempt from OOM killing.
The oom_unkillable_task function checks for several conditions, including whether the task is pid 1:
bool oom_unkillable_task(struct task_struct *p)
{
if (p->pid == 1) /* init */
return true;
// ... other conditions ...
}
The oom_kill_process function then calls oom_unkillable_task to determine whether a process should be spared from OOM killing:
if (oom_unkillable_task(p))
return 0; /* don't kill */
To find these changes in the Linux kernel source code, you can use git grep or other code search tools. For example:
Keep in mind that the Linux kernel source code is constantly evolving, so the exact line numbers and function names might change over time. However, the underlying logic and intent remain the same: to prevent the OOM killer from killing the init process (pid 1) to ensure system stability.
/* return true if the task is not adequate as candidate victim task. */
static bool oom_unkillable_task(struct task_struct *p)
{
if (is_global_init(p))
return true;
if (p->flags & PF_KTHREAD)
return true;
return false;
}
The logic as it is today (and may or may not have always been) is that oom_badness returns LONG_MIN for “unkillable” tasks, and likewise oom_evaluate_task returns early on unkillables.
/**
* is_global_init - check if a task structure is init. Since init
* is free to have sub-threads we need to check tgid.
* @tsk: Task structure to be checked.
*
* Check if a task structure is the first user space task the kernel created.
*
* Return: 1 if the task structure is init. 0 otherwise.
*/
static inline int is_global_init(struct task_struct *tsk)
{
return task_tgid_nr(tsk) == 1;
}
malloc will never fail, since the OS just happily over commits virtual memory
Reasons malloc can fail when overcommit is enabled include hitting rlimit_data or rlimit_as.
Also address space exhaustion/fragmentation which is much less likely in our 64bit world but you can still see it happen with a bug that leads to calling malloc with a negative number.
Yeah I’m not talking about init specifically, this is a tangent, just the specific conflation (the myth? can I call it a myth?) of malloc never failing with overcommit enabled.
As a human I felt pretty good about the descriptions in 1-3, then 4 is somewhat unclear and 5 seemed to jump off a cliff of understanding for me.
I don’t know what “scuffed” is supposed to mean or why they are parallel universes. I think it’s there’s an additional timeline (the global timeline) in addition to the threads individual timelines but if that’s correct I think we’re already in parallel “universes” with multiple threads? Maybe it’s because “sync up” is a little vague in the previous sections?
“Scuffed” is gamer-originated slang for a gradient of qualities. Something that is “scuffed” is something that:
Indicates the creator/actor of the thing needs more practice
Fell just short of it’s intended goal
Is an obvious hack, kludge, jerry-rig, etc
Is of low quality or craftsmanship
Requires cleaning, refinement, or repair (c.f. the typical non-internet usage “scuffed shoes”)
Tone generally ranges from self-deprecation, questioning skepticism, and friendly derision. True haters are more likely to call something “trash” than “scuffed”.
“Scuffed” is slang for off-brand/“poor man’s X” (EDIT: better described by KOMON). “Parallel universes” is a reference to SM64 speedrun slang that became a meme in some programming communities. I think you correctly understood the “additional global timeline” concept; Just wanted to have a bit of fun here.
The fence section (5) was tough as I feel it didn’t sufficiently escape the “formal reasoning” terms from the cppref, especially for SeqCst fence. It’s mostly due to fences being so loosely defined (they “sync up” with (potentially many) atomic read/writes that happen before or after them.. like what?). If you’re interested, I helped co-author the proposal to remove atomic fences from Zig (ignore downvotes, it’s an inside joke) — It covers the practical scenarios where they’re used and how to replace them.
More generally, RAII is a feature that exists in tension with the approach of operating on items in batches, which is an essential technique when writing performance-oriented software.
This is a falsehood some people in the intersection of the data oriented design and C crowd love to sell. RAII works fine with batches, it’s just the RAII object is the batch instead of the elements inside. Even if the individual elements have destructors, if you have an alternative implementation for batches C++ has all the tools to avoid the automatic destructor calls, just placement new into a char buffer and then you can run whatever batch logic instead. Don’t try bringing this up with any of them though or you’ll get an instant block.
I do highly performance sensitive latency work with tighter deadlines than gamedev and still use destructors for all the long lived object cleanup. For objects that are churned aggressively avoiding destructor calls isn’t any different than avoiding any other method call.
Agreed, this post is making some wild claims that don’t hold up in my experience. I’m writing a high-performance compiler in Rust, and most state exists as plain-old datatypes in re-usable memory arenas that are freed at the end of execution. RAII is not involved in the hot phase of the compiler. Neither are any smart pointers or linked lists.
I simply find the argument unconvincing. Visual Studio has performance problems related to destructors => RAII causes slow software?
“Exists in tension” seems accurate to me. Yes, you can do batches with RAII, but in practice RAII languages lead to ecosystems and conventions that make it difficult. The majority of Rust crates use standard library containers and provide no fine grained control over their allocation. You could imagine a Rust where allocators were always passed around, but RAII would still constrain things because batching to change deallocation patterns would require changing types. I think the flexibility (and pitfalls) of Zig’s comptime duck typing vs. Rust traits is sort of analogous to the situation with no RAII vs. RAII.
I think it’s the case that library interfaces tend not to hand control of allocations to the caller but I think that’s because there’s almost never pressure to do so. When I’ve wanted this I’ve just forked or submitted patches to allow me to do so and it’s been pretty trivial.
Similarly, most libraries that use a HashMap do not expose a way to pick the hash algorithm. This is a bummer because I expect the use of siphash to cause way more performance problems than deallocations. And so I just submit PRs.
Yes. I write Zig every day, and yet it feels like a big miss, and, idk, populist? “But don’t just take my word for it.” Feels like too much trying to do ‘convincing’ as opposed to elucidating something neat. (But I guess this is kind of the entire sphere it’s written in; what does the “Rust/Linux Drama” need? Clearly, another contender!)
It doesn’t, but without it I don’t really see the post offering anything other than contention for the sake of marketing.
I spend somewhere between 2 to 8 hours a day working on my own projects. (“2” on days I also do paid work, but that’s only two days a week.) Zig has been my language of choice for four or five years now; you can see a list on my GitHub profile. A lot of my recent work with it is private.
Thank you! I really like it, and I’m a little sad that Rust — which I still use often, maintain FOSS software in, and advocate for happily! — has narrowed the conversation around lower-level general-purpose programming languages in a direction where many now reject out of hand anything without language-enforced memory safety. It’s a really nice thing to have, and Rust is often a great choice, but I don’t love how dogmatic the discourse can be at the expense of new ideas and ways of thinking.
I very much agree. A Zig program written in a data-oriented programming style, where most objects are referenced using indices into large arrays (potentially associated to a generation number) should be mostly memory safe. But I haven’t written enough Zig to confirm this intuition.
I don’t remember the arguments against RAII much (has been a few years since) but that Zig doesn’t have RAII feels like an odd omission given the rest of the language design. It’s somewhat puzzling to me.
Hm, it’s pretty clear RAII goes against the design of Zig. It could be argued that it’d be a good tradeoff still, but it definitely goes against the grain.
Zig requires keyword for control flow. RAII would be a single instance where control jumps to a user defined function without this being spelled out explicitly.
Zig doesn’t have operator overloading, and, more generally, it doesn’t have any sort of customization points for type behavior. «compiler automatically calls __deinit__ function if available” would the the sole place where that sort of thing would be happening
Idiomatic Zig doesn’t use a global allocator, nor does it store per-collection allocators. Instead, allocators are passed down to specific methods that need them as an argument. So most deinits in Zig takes at least one argument, and that doesn’t work with RAII.
I was unaware that Zig discourages holding on to the allocator. I did not spend enough time with Zig but for instance if you have an ArrayList you can defer .deinit() and it will work just fine. So I was assuming that this pattern:
var list = ArrayList(i32).init(heap_allocator);
defer list.deinit();
Could be turned into something more implicit like
var list = @scoped(ArrayList(i32).init(heap_allocator));
I understand that “hidden control flow” is something that zig advertises itself against, but at the end of the day defer is already something that makes this slightly harder to understand. I do understand that this is something that the language opted against but it still feels odd to me that no real attempt was made (seemingly?) to avoid defer.
But it very much sounds like that this pattern is on the way out anyways.
Zig’s std.HashMap family stores a per-collection allocator inside the struct that is passed in exactly once through the init method. Idk how that can be considered non-idiomatic if it’s part of the standard library.
Zig is a pre 1.0 language. Code in stdlib is not necessary idiomatic both because there’s still idiom churn, and because it was not uniformly audited for code quality.
As someone who doesn’t use Zig or follow it closely, both the fact that that change is being made and the reason behind it are really interesting. Thanks for sharing it here
Even if the individual elements have destructors, if you have an alternative implementation for batches C++ has all the tools to avoid the automatic destructor calls, just placement new into a char buffer and then you can run whatever batch logic instead.
I’ve never used placement new, so I don’t know about that, so my question is, how do you do that? Take for instance a simple case where I need a destructor:
If I have a bunch of elements that are both constructed at the same time, then later destroyed at the same time, I can imagine having a dedicated Element_list class for this, but never having used placement new, I don’t know right now how I would batch the allocations and deallocations.
And what if my elements are constructed at different times, but then later destroyed at the same time? How could we make that work?
Don’t try bringing this up with any of them though or you’ll get an instant block.
I think I have an idea about their perspective. I’ve never done Rust, but I do have about 15 years of C++ experience. Not once in my career have I seen a placement new. Not in my own code, not in my colleagues’ code, not in any code I have ever looked at. I know it’s a thing when someone mentions it, but that’s about it. As far as I am concerned it’s just one of the many obscure corners of C++. Now imagine you go to someone like me, and tell them to “just placement new” like it’s a beginner technique everyone ought to have learned in their first year of C++.
I don’t expect this to go down very well, especially if you start calling out skill issues explicitly.
I’ve never done Rust, but I do have about 15 years of C++ experience. Not once in my career have I seen a placement new. Not in my own code, not in my colleagues’ code, not in any code I have ever looked at.
I’m a little bit surprised, because I’ve had the opposite experience. Systems programming in C++ uses placement new all of the time, because it’s the way that you integrate with custom allocators.
In C++, there are four steps to creating and destroying an object:
Allocate some memory for it.
Construct the object.
Destruct the object.
Deallocate the memory.
When you use the default new or delete operators, you’re doing two of these: first calling the global new, which returns a pointer to some memory (or throws an exception if allocation fails) and then calling the constructor, then calling the destructor. Both new and delete are simply operators that can be overloaded, so you can provide your own, either globally, globally for some overload, or per class.
Placement new has weird syntax, but is conceptually simple. When you do new SomeClass(...), you’re actually writing new ({arguments to new}) SomeClass({arguments to SomeClass's constructor}). You can overload new based on the types of the arguments passed to it. Placement new is a special variant that takes a void* and doesn’t do anything (it’s the identity function). When you do new (somePointer) SomeClass(Args...), where somePointer is an existing allocation, the placement new simply returns somePointer. It’s up to you to ensure that you have space here.
If you want to allocate memory with malloc in C++ and construct an object in it, you’d write something like this (not exactly like this, because this will leak memory if the constructor throws):
template<typename T, typename... Args>
T *create(Args... args)
{
void *memory = malloc(sizeof(T));
return new (memory) T(std::forward<Args>(args)...);
}
This separates the allocation and construction: you’re calling malloc to allocate the object and then calling placement new to call the constructor and change the type of the underlying memory to T.
Similarly, you can separate the destruction and deallocation like this (same exception-safety warning applies):
In your example, std::unique_ptr has a destructor that calls delete. This may be the global delete, or it may be some delete provided by Foo, Bar, or Baz.
If you’re doing placement new, you can still use std::unique_ptr, but you must pass a custom deleter. This can call the destructor but not reclaim the memory. For example, you could allocate space for all three of the objects in your ‘object’ with a single allocation and use a custom deleter that didn’t free the memory in std::unique_ptr.
Most of the standard collection types take an allocator as a template argument, which makes it possible to abstract over these things, in theory (in practice, the allocator APIs are not well designed).
LLVM does arena allocation by providing making some classes constructors private and exposing them with factory methods on the object that owns the memory. This does bump allocation and then does placement new. You just ‘leak’ the objects created this way, they’re collected when the parent object is destroyed.
I’ve done very little systems programming in C++. Almost all the C++ code I have worked with was application code, and even the “system” portion hardly did any system call. Also, most C++ programs I’ve worked with would have been better of using a garbage collected language, but that wasn’t my choice.
This may explain the differences in our experiences.
Yup, that’s a very different experience. Most C++ application code I’ve seen would be better in Java, Objective-C, C#, or one of a dozen other languages. It’s a good systems language, it’s a mediocre application language.
For use in a kernel, or writing a memory allocator, GC, or language runtime, C++ is pretty nice. It’s far better than C and I think the tradeoff relative to Rust is complicated. For writing applications, it’s just about usable but very rarely the best choice. Most of the time I use C++ in userspace, I use it because Sol3 lets me easily expose things to Lua.
I think it very much also depends on the subset of C++ you’re working with, at a former job I worked on a server application that might have worked in Java with some pains (interfacing with C libs quite a bit), and in (2020?) or later it should have probably be done in Rust but it was just slightly older that Rust had gained… traction or 1.0 release. It was (or still is, probably) written in the most high-level Java-like C++ I’ve ever seen due to extensive use of Qt and smart pointers. I’m not saying we never had segfaults or memory problems but not nearly as many as I would have expected.
But yeah, I think I’ve never even heard about this placement new thing (reading up now), but I’m also not calling myself a C++ programmer.
Placement new is half the story, you also need to be aware that you can invoke destructors explicitly.
A trivial example looks like
char foo_storage[sizeof(foo)];
foo *obj = new (&foo_storage[0]) foo();
obj->do_stuff();
obj->~foo(); //explicitly invoke the destructor
If you want to defer the construction of multiple foos but have a single allocation you can imagine char foos_storage[sizeof(foo)*10] and looping to call the destructors. Of course you can heap allocate the storage too.
However, you mostly don’t do this because if you looking for something that keeps a list of elements and uses placement new to batch allocation/deallocation that’s just std::vector<element>.
Likewise if I wanted to batch the allocation of Foo Bar and Baz in Element I probably would just make them normal members.
class Element
{
Foo foo;
Bar bar;
Baz baz;
};
Each element and its members is now a single allocation and you can stick a bunch of them in a vector for more batching.
If you want to defer the initialization of the members but not the allocation you can use std::optional to not need to deal with the nitty gritty of placement new and explicitly calling the destructor.
IME placement new comes up implementing containers and basically not much otherwise.
Note that since C++20+ you should rather use std::construct_at and std::destroy_at since these don’t require spelling the type and can be used inside constexpr contexts.
You likely use placement new every day indirectly without realizing it, it’s used by std::vector and other container implementations.
When you write new T(arg) two things happen, the memory is allocated and the constructor runs. All placement new does is let you skip the memory allocation and instead run the constructor on memory you provide. The syntax is a little weird new(pointer) T(arg). But that’s it! That will create a T at the address stored in pointer, and it will return a T* pointing to the same address (but it will be a T* whereas pointer was probably void* or char*). Without this technique, you can’t implement std::vector, because you need to be able to allocate room for an array of T without constructing the T right away since there’s a difference between size and capacity. Later to destroy the item you do the reverse, you call the destructor manually foo->~T(), then deallocate the memory. When you clear a vector it runs the destructors one by one but then gives the memory back all at once with a single free/delete. If you had a type that you wanted to be able to do a sort of batch destruction on (maybe the destructor does some work that you can SIMD’ify), you’d need to make your own function and call it with the array instead of the individual destructors, then free the memory as normal.
I’m not trying to call anybody out for having a skill issue, but I am calling out people who are saying it’s necessary to abandon the language to deal with one pattern without actually knowing what facilities the language provides.
There are different ways you could do it but one way would be to have a template that you specialize for arrays of T, where the default implementation does one by one destruction and the specialization does the batch version. You could also override regular operator delete to not have an implementation to force people to remember to use a special function.
This is intended to be comfortable to follow strictly, so more likely to be reliable in practice:
I find it less comfortable. Current SemVer is strictly objective - if breaking, then major. Else: if affects externally-visible behaviour, minor; else patch. BreakVer requires me to make a subjective judgement about what constitutes a major breakage - which requires knowing about all my customers - which is, practically, impossible. Plus, depending on my confidence/arrogance/move-fast-and-break-things-iness, I’ll make classifications that do not match up with customers’ expectations.
There’s only 2 types of version bumps: Those that are definitely safe (non breaking) / Those that require checking the CHANGELOG
That is currently true. You’ve just shifted the boundary of specificity - instead of subdividing non-breaking changes, you’re sub-dividing breaking changes. In practice, either:
folks will be appropriately cautious in checking every major or minor change for impact - in which case, they have no benefit from this method
they will be lazy and not check minor changes - in which case some breakages will still slip by, as they do currently, but the software publisher will not even be in-the-wrong and thus, hopefully, correct their behaviour. A situation in which every actor can make rational justified choices and still end up with a bad outcome is not one we should actively try to design.
People strongly resist bumping major version numbers for every breaking change.
My descriptivist take is that SemVer isn’t what the spec says, but what the tooling does with it, and how people are affected by that.
If you bump the first number, people won’t update quickly, and this may be a good thing or a bad thing, depending on what your goal is.
If you bump the other numbers you may or may not upset a lot of people and get complaints.
So you weigh the risk of getting an avalanche of complaints vs how inconvenient it would be for you if users didn’t update your software quickly, and maybe also complained it doesn’t update automatically often enough.
Yes, it is. It doesn’t matter whether the change is breaking “for someone” - it matters that the change is breaking to a public contract. If you take a dependency on behaviour that isn’t an explicit part of the an established contract, you have no right to expect warning when it changes.
Breaking in semvaris objective. It’s defined by the spec as a change in the behavior of your public interface. If you haven’t documented your public interface, you aren’t following semvar.
While this may be strictly true, it also implies that almost no one is actually following semver, which doesn’t seem like a very productive framing to me.
Huh? When the semvar spec says “public API”, I imagine some docs you can read that list all of the functions in a package along with their type signatures and a description of their behavior. Most of the packages you use have this, no?
That’s the package’s public interface. If the package changes one of those type signatures, that’s a breaking change. If it introduces a new function, that’s not breaking. If it makes a change that violates the documentation for one of its functions, that’s a breaking change. If it makes a change to the behavior of a function that’s consistent with that function’s docs… well either that’s not a breaking change, or as is common the function was inadequately documented.
This all seems fairly unambiguous to me, excepting changes to the behavior of poorly documented functions. Am I missing something?
The example I’ve gone round and round a bunch of times with people on is: Go 1.13 introduced the ability to use underscores as grouping separators in integer literals, like 1_000_000 instead of 1000000.
This also changed the behavior of Go’s integer-parsing functions. For example, strconv.ParseInt() suddenly started accepting and parsing inputs with underscore characters rather than returning an error. And the Go team seem to have been aware that there were people whose code was broken by this change, which would be a problem for Go’s claim that there will never be breaking changes ever, for any reason.
Generally people have argued with me that although ParseInt() was a public function, it was somehow underspecified or ambiguously specified prior to Go 1.13 and therefore it was acceptable to clarify its behavior in Go 1.13 by suddenly changing the inputs it accepted. But this just points to the real purpose of SemVer: it’s about protecting the developer of the code from the user of the code, by giving the developer endless subjective loopholes and ways to say “sure, that change broke your code, but it’s still not technically a breaking change”. For example, any function which does not specify up-front the entire set of potential inputs it will accept and the results it will return for them is subject to the ParseInt() style of “oops, we underspecified it” loophole.
Ah, I get it. There are three things the Go docs for ParseInt() could say:
They could positively state what sorts of integers ParseInt() accepts, so that a reader could confirm it would not parse underscores. E.g. giving a regex of accepted inputs.
They could carve out negative space to allow a variety of behaviors. E.g. saying “parses integers in the same way that the Go language does, which is subject to change”.
They could be silent in the matter.
Reading the actual docs, I’d frankly put them in the first case: they state what the accepted integer syntax is, and give examples, and all of this makes it rather clear that underscores aren’t part of the integer syntax, any more than “thirty seven” would be.
But even if the docs weren’t clear, you don’t get to say “LOL no change is breaking change because I forgot to write docs”. That just means you’ve entered a gray area, and you should be cautious about what counts as a breaking change and your users should be cautious about not relying on too much. It should be a “meet in the middle” sort of a thing, not a “how C++ compiler writers interpret undefined behavior” sort of a thing.
tldr; I’m sorry that people are treating “incompletely documented” the same as “explicitly unspecified”, those are very different things.
Isn’t accepting “1_000” in Go source also a breaking change by the same reasoning as it would be for ParseInt? Input that used to result in an error no longer does.
Maybe in some technical sense, but people rely on both positive&negative behavior of ParseInt() (e.g. rejecting invalid user input), but generally only rely on positive Golang behavior. If “this program started to compile when it used to be an error” was considered a breaking change, every change in language behavior would be breaking.
What I meant was that in the real world, it’s very common for an API to be underdocumented, with the result that it’s not well-defined whether a given change breaks the API or not. Like, you can look at certain changes and say “this really seems like it breaks the API,” but the API was defined vaguely enough that it’s impossible to make any judgment like that.
You say “…excepting changes to the behavior of poorly documented functions,” but I think that’s actually a very large category, in practice :-) Obviously there are some libraries and frameworks that take defining their APIs very seriously, but I would guess that the set of libraries that use SemVer is an order of magnitude larger than the set of ones that are strict about it in this way.
Yeah, that all makes sense. I’d argue that if it’s hard to make that judgement call, the library should be conservative and bump the major version number.
Is there a versioning system you think does better in the presence of a poorly defined API?
I don’t know of one, and I suspect that carefully defining the API is a prerequisite for any versioning system to be able to give you the kind of guarantees we want.
Hyrum’s Law describes a common pathological condition of dependency relationships between software modules, it doesn’t define a de facto baseline/standard guarantee or expectation of compatibility.
That person is welcome to be upset and to ask for consideration, but they are by no means entitled to it. SemVer is about breaking the explicit, published contract of software, not about breaking any hypothetical consumer. If you take a dependency on undefined behaviour, you have no rights to complain when it changes, nor any justification to expect warning when it does.
I found that improving typing speed improved my code, not because I got the code down faster, but because I was willing to type more. When I started programming, aged 7, I typed with two fingers and had to search the keyboard to find keys. I used single-letter variable names everywhere because typing longer ones was hard. I rarely wrote comments, because writing full English sentences was hard. By the time I was an adult and I typed far faster than I ever wrote with a pen, writing a meaningful variable name cost nothing but the value when I came to read the code was immense. Cost-benefit tradeoffs are very different when the cost drops to nearly zero.
Absolutely, and I’m noticing this now that I’m basically relearning how to type because I assembled a split keyboard and configured it with the colemak layout. I can type about 35wpm in monkeytype with only lowercase letters and spaces at the moment, but I haven’t committed all the symbols to memory yet so typing code is very slow.
I tried doing a full day of work with the new keyboard yesterday but it didn’t go well. I found that having to actively think about where all the symbols are and making more errors while typing makes trying out different approaches much less desirable, and it constrains my ability to reason about code simply because I’m holding one more thought process in my head that wasn’t there before.
Definitely! I’m practicing every day and I started using the new keyboard almost exclusively when I’m off work. I think the split keyboard did a lot for my motivation. I really want to switch to it because it feels very comfortable to use (especially so with colemak) and, well, it’s a shiny new toy :) And since it uses QMK firmware I can even identify uncomfortable key combos or symbol positions and iterate to improve them, which triggers the tinkerer part of my brain.
There are adults who still do this. Anyone who wants their code to be accepted and adopted by other people should absolutely learn to touch type. It’s never too late.
I’d argue that my typing speed mostly helps me in communication and research rather than coding, since I spend a lot more time thinking than I spend typing, in most of my projects. As @eterps pointed out in a different thread, that’s also an essential part of developing software, but I want to make a slightly different point.
However, touch typing is still one of the most useful computer skills I ever acquired. It certainly allows me to communicate in messaging apps effectively because the cost of typing is very low for me. After all, IRC and flamewars in FOSS channels is where I practiced touch typing most. ;)
It also helps me use different devices. My current laptop has keyboard with a British layout, which includes some design decisions beyond my comprehension, such as the pipe character (|) only accessible via AltGr-Shift-L and a key for #/~ above the Enter key. But I don’t actually care what is written on the keys because I don’t look at those keys anyway — my layout is set to US English with | where it, arguably, should be; and #, ~, etc., where I expect them to be.
I agree with you about the importance of written communication in talking about software design. And I, too, have a non-English (Spanish in my case) laptop and changed the layout to US English, which took a little getting used to, but was fine. …And I also stand by my assertion that there are adults who I’ve seen with my own eyes use single-letter variable names everywhere because they don‘t know how to touch type.
One of the nice things about Visual Studio is that it uses the same parser and semantic analyzer for syntax highlighting and error reporting as it does for code generation, allowing for much better feedback than you get from editors that completely separate the two
To go further, I remove /usr/games since I run GUI programs (like games) through their .desktop entries. These entries are located at ${XDG_DATA_DIRS}/applications.
Games that end up in /usr/games such as those packaged in bsdgames (file listing) are frequently not GUI programs.
Not that I am especially opposed to you removing that from your path regardless.
I think this idea is on to something, but I wonder, when do I larry something like an Authorization header (or possibly more ambiguous, a cookie)?
I can imagine at least two or three options:
the HTTP lib marks the headers as inproc and then larries them just as it is sending the request on the wire. Nobody is writing some middleware that (casually) logs all your headers.
Headers are xproc by their nature. You have to larry when you set the header.
Some headers are inproc and some are xproc, the http lib larries the inproc ones as it sends the request on the wire.
You use tracing instrumentation as an example, I’ve definitely written some general code to slurp data out of the request object.
The problem with 1 is there is a difference between sending Authorization to api.website vs to log.website.
With 2 once they are xproc they are xproc. Log(Request.Authorization) and SendRequest(Request) are not distinguished but you want to prevent the former and allow the latter.
For internal consistency, the values of HTTP headers would have to be xproc — they by definition leave the process (outgoing) or have come from outside the process (incoming). Your listed problem with option (2) is spot-on: the lack of distinction between logging and sending requests makes this whole thing break down! I’m tempted to suggest a split between logging, network, and files for inproc, but the real world simply isn’t that clean.
Since the current approach is compile-time only (and I haven’t yet solved the Reflection issue), option (3) isn’t very easy. It’d require either separate http.Request.Headers and http.Request.HeadersInproc fields that get merged during serialization or some stdlib-blessed escape hatches. The presence of those escape hatches feels like this wouldn’t be an effective solution.
By default, you’re one fmt.(Sp|P)rintf(”%v\n”, session) away from including a user’s password hash in your logs, writing it to a file, or returning it in a server response.
Type systems as they exist today are unable to prevent this kind of bug. A string containing a user’s name is indistinguishable from the string containing their salted+hashed password at compile-time. While some tools and techniques exist to mitigate these issues, they’re at-best difficult to set up and at-worst (and most commonly) not attempted at all.
Hard disagree. It is trivial to prevent this in Rust via implementing Debug manually and hashing out pwHash.
Yep, I totally recognize that it’s possible (+ @square_usual’s comment – sibling to this). I mention these approaches in “Custom Serialization”. The main drawback is that it’s still possible to access the pw_hash struct field directly and log it, write that to a file, concatenate it with some other strings, etc.
It’s very possible that I’m overlooking something, but I just don’t think that’s true. At least not in Rust (or similar ML-like languages): You wrap the UserSession in a module and only export functionality you need outside. Example. Notice that if you uncomment the last line of main, you get a compile-time error, because pw_hash is not accessible outside of the user_session module.
With the proposed system you couldn’t log pw_hash even within the user_session module. It would also protect you from accidentally implementing a version of fmt::Debug that printed the actual value.
(@lcapaldo’s peer comment is correct. This was already written though 😅)
Yep! That’s pretty standard domain-driven design. Expanding that one step requires reading the hash from a database via a database singleton or a repository pattern or whatever you prefer. But the fact remains that something has that value and can do something ill-advised with it :)
Honest question here: what’s so bad about printing the password hash? It’s not the password, it’s the hash (a one-way function that is hard to reverse). The whole reason is to store a password safely (rather than the actual plain text password). If you are worried about rainbow attacks, then use a better hash.
Probably nothing, to be honest! I’m channeling my inner security engineer with that, and it’s likely a bit overly-worried.
s/password hash/session token/ and the general idea still applies though. We need to decrypt and validate the session token, but logging it means anyone with log access can impersonate the requesting user.
It splits apart new to allocate memory and initialize to initialize it like Objective C does, which is always nice to see. Usually unnecessary, but nice to see.
This is also done by ruby which is where I expect Crystal got it from; there is a common ancestor in Smalltalk for ruby and Objective-C which also does the allocate/initialize split, at least in some versions. It’s possible ruby lifted it from Objective-C, allocate is the name Obj-C uses.
Splat assignment probably evolved out of Python’s * function argument swizzling operators
Again probably by way of ruby not python, ruby has called * the splat operator for a long time (forever?).
The “using” does involve judgement, and not based on weighting X and Y to create a new combined metric. It has much more actionable guidance than just “use your judgement” though.
Since this is an application, and the problem being solved is coming from outside (somebody killing the main process) or is a bug (crash, otherwise exiting without cleaning up its children), you might consider letting systemd (or something similar) do the heavy lifting. It will put all the processes into a cgroup (not a pid namespace), and when the main process exits for whatever reason systemd will take everything out in the cgroup, which is possible because you can enumerate all the processes in a cgroup.
You can use systemd-run --user to do this in an ad hoc way.
@MaskRay has a good overview of a lot of the mechanics of TLS which I found very handy for implementing a custom debugger. Opting into specific models can get you out of the business of
__tls_get_addreven for SOs but there are tradeoffs.Musl makes different choices than glibc for how the TLS is allocated that is closer to the authors “shimming pthread_create” out of the box
I found that article indispensable in implementing thread local storage myself.
just updated the post with a link to that overview and the initial-exec TLS model which can speed things up if you can use it
Asdf seems a bit overloaded: https://asdf.common-lisp.dev/ vs https://asdf-vm.com/
My guy the more recent of the two is 11ish years old, you are simply too late.
I shall declare the name of my new scripting language, Lua! The last version of the other Lua was months ago! Surely there will be no confusion!
eyeroll
You misunderstood. The first project started in like 2005, the second in 2014. They both have over a decade of use and both continue to be used. Neither is gonna be renamed.
Hope that helped.
No. I didn’t misunderstand. Yes, both have continued to exist largely without problems, but the fact that we have two, independent, package managers named “asdf” is just stupid. In 2014, a quick google search for “asdf” would have easily found asdf for Common Lisp. Why’d we get here in the first place?
If it wasn’t another package manager, I’d be more inclined to not care. But, come on.
And yes, the ship has sailed for asdf — but my Lua will be a general - general scripting language! It’s not just a general embeddable scripting language… No. It takes scripting languages to a whole new level of general-ness!
edit to say: 2005 actually seemed too late. Turns out asdf is 24 years old.
If you didn’t misunderstand my comment then you misrepresented it. That’s worse. Please stop trying to score points on people.
Neither project advertises itself as a package manager.
It’s true that asdf for Common Lisp is not a package manager — it’s a build tool that… creates packages.
The other asdf swaps out “runtimes” … a so called “version manager” allowing you to ensure that the right set of tools are being used when you go to develop, build, etc. It can install runtimes, and it can remove runtimes. I mean, I don’t want to go against the wishes of the author… but if it looks like a duck…
But, I’ll choose to concede to your point that neither of these are de facto “package managers,” but it’s hard to argue against the fact that they are in the same general purpose space.
Yes, I remember circa 2008 or 2010 when Faré revamped it as asdf2. But what’s the point? So there’s a namespace collision across two systems. No Lisper is going to mistake it.
I will not claim to be a Lisper, but having used neither in anger I was for a time under the vague misapprehension that there was only one asdf and that the Lisp one could be (ab)used to install non-Lisp things, kind of like pip sometimes is.
I fully admit this was an entirely a me problem and if I ever tried to use either my confusion would have been resolved, but there is my sample of 1 on the confusion being possible.
Oh, good, it wasn’t just me!
The point is that if you’re starting a new project, maybe do a quick search to see if something else by the same name is already established? This isn’t hard. It’s common courtesy. I don’t see why it’s controversial?!
For kicks I checked Wikipedia: asdf has even had a Wikipedia page since 2007.
Sure, I agree and would do the same, but someone didn’t and they made free software and put it out there for other people to use and continued to maintain it. I can’t get my nerd rage on about the accidental reuse of a project name.
Wait until you find out that Java was named after a coffee which was named after an island. Or that Python was named after a comedy skit which was named after a snake. And don’t get me started on Go or Rust or Ruby! It’s a miracle that we can get anything done in the software world with all of this semantic ambiguity! May we discuss nothing else except this endlessly fascinating topic until the end of time! /s
You really got me… I’ll give you another, the function
carin Lisps always confuses me with thecarthat you drive on the road. Surely we must move to unique identifiers every time we need to name something! /sOr, you know, you can just stop naming the two things in the same category of things the same name. That might work. The lack of common decency is crazy.
Obviously taxonomies are arbitrary and subjective, and a runtime version manager is not obviously in the same category as lisp build tooling. You could argue that they are both in the category of “software” just like you could argue that Rust (programming language) and Rust (game), yet people on this forum don’t routinely drag that up in every post with the
rusttag because there’s obviously no such moral prohibition against using pedestrian names across multiple software projects.The melodrama and bike-shedding are crazy.
Yeah, I was confused for a bit. This asdf-vm, from my reading, it’s like SDKMAN combined with <your lang’s hacked up env tool here> built in.
A classic, but I’ve always felt that it misunderstands the benefits of Java’s very restricted semantics: discoverability.
If you’re looking for some functionality in a Java library you’re unfamiliar with, you’re looking for… a class. Always a class, because, in Java, everything belongs to a class. And, because all method calls are written in SVO order (
subject.verb(object)), any method you’re looking for can be autocompleted by the IDE. Every call involves typing a dot, which gives your IDE a place to autocomplete. This often makes libraries understandable simply by reading the Javadoc or even reading through the menus in your IDE.I agree that in principle, subject/verb/object seems like a nice model.
Where I struggle to accept this is in realizing that
Trying to fit everything into the subject/verb/object model makes some of these other valuable things harder to do. (To be concrete, first-class functions are extraordinarily useful, and older versions of Java made that pattern miserable to achieve.)
Ironically this is the argument people typically make for Go: yes, there’s a lot of tedious and repetitive boilerplate, and a lot of stuff the language can’t do by design or refuses to do by fiat, but it’s all in the name of keeping it “simple” for programmers and making all Go code everywhere look and feel similar.
(I am not sure I buy that argument, but it is the same argument)
Go and Java have very similar use cases. Seems reasonable that they’d make some of the same tradeoffs.
Why ironically?
I agree, it does help with discoverability, but uh, that’s because every single class has a unique and special API to it that you must figure out in order to interact with it. All that data is “encapsulated”, and now it’s not possible to get at it.
What did Rich Hickey call it, uhh…. “Death by Specificity” ?
So the great big claim of OO / Java reusability? Gone. We know it’s gone, because we realized inheritance is mostly a trap, so that “reuse” went out the door long ago now…
So about discoverability, and the need for it… well, that’s a solution a problem you shouldn’t have, really. If you use a “Data-Oriented Programming” language or approach and have a bunch of powerful functions to operate on a few simple common data structures (your “interfaces”), then you learn those powerful functions and compose with them, and there’s nothing left to discover. You’ve got your toolbox, ready to use on everything.
—Alan Perlis
There are tradeoffs in all languages and data representations. The downside to the “everything is one of a few basic data structures” approach you see in a lot of dynamic languages is that now each library’s data representation is some unique, incompletely-documented, implementation-specific arrangement of these data structures. If you have a lot of examples to learn from, this is convenient, but what if you don’t?
The nice thing about strongly-typed OO languages is that the types and method signatures are often enough to tell you how to work with a given library.
(This isn’t to say that OO is strictly better, every paradigm has its pros and cons. But I find that people often discount the pros that do exist in unfashionable languages and paradigms.)
There are always trade-offs, sure, but in practice, most dynamic languages do have classes of some sort, they just tend to be more open.
In the case of JavaScript and Lua, you have constructors. You usually start with data emitted from a constructor, and then work with more functions or methods. For documentation, you have JSDoc, LuaDoc, and similar to provide you with explanations of inputs. Those still exist in Java and are necessary, as the type system cannot encode everything. In simpler terms, modern JS function signatures include destructuring, and that destructuring indicates what input is expected, I.e.
updatePassword({password}, newPassword)is enough to tell you that this function depends on an object that has apasswordproperty, and can work on many objects. It’s basically an interface declaration.In the case of lisps like Clojure, it’s even better, because you get the same as above, but also gain a REPL to interrogate your functions right inside your editor, in your codebase, and thus you can play with them immediately and safely because 99% of functions are pure. Since the functions do operate on those simple common data structures, you’re not going to have to mock/instantiate 3+ different classes, configure them to the state you need, pass them to the constructor of the method you’re testing, call a bunch more methods to set up the internal state and extra fields, etc, just to be able to test how a single method works. You just go and test a function directly.
Composition exists also.
OO (at least Java) composition doesn’t really work when the model is fundamentally about inaccessible mutable state via “encapsulation”. You need to get at that state, and to do that, you’re going to need to override or wrap a class, or write something that extracts that data for this specific class, and that specific class.
“Encapsulation” is antithetical to composition.
(This is why prototype-based OO ultimately wins at composition (at least in OO land), especially if it’s not encapsulated, like in JavaScript and Lua.)
The thread about posix_spawn and job control made me wonder if anything had been happening with io_uring_spawn. Apparently yes.
Oh wow, I always thought that process creation is relatively fast, but still, say, an order of magnitude slower than running your typical sans-io test. This is a big update for me, thanks! I am curious about two things here:
Great question! We actually use
posix_spawnon almost every Unix platform, which tends to be faster than fork/exec. Just using fork is definitely out of the question here – it’s not really compatible with most Rust code.There is a little wrinkle here – by default on Unix platforms, we actually
posix_spawna copy of nextest (i.e./proc/self/exe, etc), which then execs the test process. For more information about how and why, see https://nexte.st/docs/design/architecture/signal-handling/#double-spawning-processes.The clap repo is a great one for the limiting case of tons of extremely small tests. On my Linux desktop, against
fc55ad08ca438e1d44b48b9a575e9665bc0362de:So, to be clear, process creation is still quite a bit slower than doing stuff purely in-process. But it is very fast compared to what the vast majority of tests do when you weight by the time each test takes (which is what matters for end-to-end performance), and it has all of the other advantages outlined in the post. Practically speaking, the ability to run tests across multiple binaries simultaneously has maybe 100x the impact to performance that this does.
(edit: I’m aware of your very helpful advice to coalesce integration test binaries, and follow it regularly! In large workspaces though you’ll often end up with at least one integration test binary per crate.)
I’m really curious about the double-spawn. How exactly does a child process end up “half born” such that the parent never returns? Either the new process was created or it wasn’t, so what is the parent waiting for? And what makes this a flaw in the specification rather than a bug in glibc’s implementation or Linux’s syscalls? Does this happen on darwin too?
Also, if the fix is to mask out
SIGTSTPup front and then restore it in the child, what happens if you just go ahead and do that by usingPOSIX_SPAWN_SETSIGDEFto resetSIGTSTP?It’s a really subtle issue. I need to do a longer writeup on this, but https://sourceware.org/pipermail/libc-help/2022-August/006263.html has some more details.
I think I am confused. If I hit Ctrl-Z can’t I expect the parent / cli-tool to be stopped anyway? Isn’t this intrinsically racey just as a mechanism, I can’t time my ctrl-z to get the “right” process even without this situation no?
If I understand correctly the situation is basically equivalent to
which yeah, the parent is suspended until the child execs but that’s kind of just how vfork works, and can be an issue for any blocking or long run in operation in the child.
If posix_spawn was a syscall and not emulated with vfork this would presumably be fine for some definition of fine, but it still is just a question of
EDIT: Ok I think I can imagine what the bad experience is:
— vfork(2)
So you ctrl-z, do whatever, and then hit fg. Child continues but then the tool gets the deferred stop so you have to hit fg again or something?
The point at which the child process gets stuck is before a new pid is assigned to it. I’m not kidding when I say half-born.
Because of the parent being in uninterruptible sleep, the terminal appears completely hung. Ctrl-C etc don’t work. The only option is to go to a different terminal and type on kill -CONT , which is non-obvious and not reasonable to ask users to figure out.
Again, this deserves a long and detailed explanation which I need to write up at some point.
Thanks for the additional details, I can reproduce with something simple like:
This suggests an alternative workaround, rather than masking SIGTSTP, and using a helper to unmask it, install a noop handler for it in the parent. Exec will remove the handler in the child and you can remove the handler in the parent when posix spawn returns.
Thanks. I think relying on exec resetting the signal handler would definitely work in single-threaded contexts, where it’s okay to poke at the global signal handler like that. (But nextest uses Tokio’s multithreaded runtime.) One advantage of signal masks here is that they’re per-thread.
From this discussion, I assume the solution of resetting the signal mask using the
posix_spawnattrs would also be vulnerable, as the child would reset the mask prior to calling exec (and therefore immediately receive the signal)?This definitely sounds like a bug, but it doesn’t sound like a fundamental flaw of the
posix_spawnspecification, just of the way Linux has chosen to implement this. For example, ifposix_spawnwas a syscall on its own that would work, or if Linux added a mechanism to reset the signal mask upon exec then it could just mask out SIGSTOP itself across theclone(2)call.I’m still confused at the “before a new pid is assigned to it” part though. How can a process exist without a pid? It’s getting stuck in clone, but that doesn’t mean the child doesn’t have a pid, just that the parent doesn’t know the pid yet.
No, this doesn’t happen because the child inherits the parent’s signal mask (I changed Rust to do this in late 2022). The important thing with double-spawn is that the reset happens in the intermediate process, after the
posix_spawnreturns in the parent.Ah, with a second read I think the issue is that the spec is too flexible – it for implementations that are definitely incorrect.
The way I originally read https://pubs.opengroup.org/onlinepubs/9799919799/functions/posix_spawn.html, there’s the requirements to:
This means that for a library implementation of
posix_spawnbased on a vfork-like thing:execis calledposix_spawnreturnsThere’s no place anywhere to set the signal mask after the exec but before the posix_spawn returns.
Originally, I thought that the requirement for error handling meant that exec must be called before posix_spawn returns, but on a second read the documentation says that error 127 may be returned for async handling instead.
Solutions that would resolve this are something like:
sigmaskandpthread_sigmaskinteract with, and a coarser sigmask that gets reset on exec.Based on this, as well as the glibc mailing list conversation, that the posix_spawn spec is underspecified and allows for some clearly incorrect behavior. For example, I think it should clearly say that signals MUST only be delivered to the child process after posix_spawn returns.
Sorry, a new pid does get assigned to it. I was thinking of the command line which hasn’t changed yet since this is really early in the process.
I just repro’d it in this gist, with instructions included. You can see the new process stuck with a new pid but the same command line and virtual mem size (i.e. it’s a vfork of the parent process before exec is called).
Ultimately the problem with stopping the child is the parent is technically not stopped, just in uninterruptible sleep, and so the shell does not resume control of the terminal, right? If the shell were to resume terminal control anyway and allow you to type
fgthen things would be ok?This suggests another solution, which is if a process is in uninterruptible sleep due to vfork, and the child process is stopped, the parent should report its status as though it’s stopped too (and its parent should get a
SIGCHLD). Which is to say, the notion of “this process is stopped” should be part of the state shared by the vfork (and sending aSIGCONTto the parent should work to wake up the child).That said, I’m inclined to prefer solutions that simply make
posix_spawnappear atomic, since I don’t know what other uses there are forvforkbesides implementing behavior likeposix_spawn. So a solution of the form “defer stops until after the child execs” is preferable.For normal process spawns yeah that would work. Nextest puts each test in a process group, but also ensures that stop and continue signals are forwarded — so it would also work.
I agree that stops (and maybe most/all other signals) should just be deferred until the child execs.
This seems to be real bug in glibc or Linux. Please, report it to glibc bug tracker. With proper reproduction steps, i. e. with some small C program, which reproduces the issue. And post link here (because the bug is interesting for me). Even if the bug turns out to be unfixable, having proper reproduction steps is still valuable. If you have difficulties with creating glibc bugtracker account, post reproduction steps here, I will forward them to bugtracker
Already did that! It kind of died off once folks realized double-spawn was a correct workaround (see the thread).
https://sourceware.org/pipermail/libc-help/2022-August/006263.html
I kind of stopped pursuing it as well. I think there are a few solutions that would require adding (yet another) mode to the clone3 syscall, which I’ve listed here: https://lobste.rs/s/bnjeid/why_nextest_is_process_per_test#c_zc5gor
But the fact that I’d have to keep the double-spawn workaround on Linux for at least a decade or so, not to mention any other platforms which may have the same issue, made me kind of move on haha :)
I still don’t see reproduction steps. Maybe I missed something? Can you give some direct link to reproduction steps? All I see is “just run nextest on clap with
NEXTEST_DOUBLE_SPAWN=0”, for example here https://gist.github.com/sunshowers/bea6f7924347fedfd5bf6478e3551675#file-gistfile1-txt-L17 .I want to reproduce the issue and to understand what is going on! The issue seems interesting and I want to understand it!
“just run nextest on clap” will definitely help me reproduce the issue, but will not help me understand it at all. I want to understand what you trying to do, what exactly causes the process to get to uninterruptable state.
“just run nextest on clap” means that in order to understand the bug I need to remove code from nextest codebase until it becomes 20-30 lines, which is core of the issue. This will probably take me many hours or even days.
I suspect that this is the reason why glibc devs don’t want to dig further
Gotcha!
Well, the simplest reproduction is in this comment: https://lobste.rs/s/bnjeid/why_nextest_is_process_per_test#c_gz4ohu — press ctrl-z during that sleep.
But this doesn’t actually use posix_spawn, just vfork, and is more of an indication of the problem rather than providing deep insight into the problem itself. A more complex reproduction would involve posix_spawning tons of short-lived processes rapidly (you don’t need to do anything special around signal masks) and hitting ctrl-z — it may take a few tries to repro, and also using many threads might make it repro faster.
With nextest, the race window is slightly smaller because it creates a process group for each test (so the window is between signal unblocking and setting the process group, not signal unblocking and exec). But this simpler example may help guide you.
The stack trace in my email to the list may also help. I obtained it via gdb on the stuck process, then manually traced through the glibc code to surmise what was happening.
I’ll be honest, it would be quite low priority for me to work on a full reproduction — I have limited time and many other commitments at the moment. But you’re welcome to take it from here!
Okay, using that link I was able to reproduce something. I compiled this program:
Then I started the program and pressed Ctrl-Z. Nothing happened. And the program became totally unresponsive. I was unable to kill it any way (Ctrl-Z, Ctrl-C, etc). The only thing that worked was killing it using
kill -9 ...from another terminal window.Is this bug you are talking about?
Even if answer is “yes”, these reproduction steps are still not perfect. “vfork” manpage disallows calling “sleep” in child created by “vfork”. The only allowed functions are “_exit” and “exec*”. So (I think) glibc and Linux devs will simply answer “you are not allowed to call sleep after vfork, so you are on your own”.
Are there some better reproduction steps?
Then I tried to reproduce using
posix_spawn. I wrote this code:I run that program, tried to stop it using Ctrl-Z several times, and… I saw no bugs. Every time I stopped the process group, I was able to successfully resume it using “fg”. Behavior was totally expected.
So, yes, this time I was unable to reproduce anything.
My program features 3 magical constants: count of interations (1000), time of execution of given process (0.1 second) and sleep between runs (10 000 usecs). Maybe I need to tweak them somehow?
I say all this, because I’m genuinely curious. And I think nothing will be done until we submit proper bug report with proper reproduction steps to glibc or Linux bug tracker. Then hopefully glibc/Linux devs will start work on this.
Of course, if you don’t have time, this is totally understandable. Feel free to ignore this comment if you want.
P. S. I’m writing all this from my new Dell Precision 7780 with ECC memory. I bought it for various reasons, but one of the reasons was this: I have read your article on ECC memory ( https://sunshowers.io/posts/am5-ryzen-7000-ecc-ram/ ). (Another reason was Torvalds’ opinion on ECC.) After buying this laptop one of the first thing I did was running commands from your article to verify that this laptop actually has ECC memory. And it seems it have. Thanks a lot!
Yes, vfork by itself gives a sense of what the problem feels like, but doesn’t indicate the problem itself.
If shorten the process down to absolutely nothing, e.g. something like “true” and putting in no or a sub 1ms delay — just let process creation rip. If that doesn’t work, I’d try doing it across several threads.
It seems I reproduced!
Here is my new code:
If I run this program and press Ctrl-C, then program terminates cleanly. But if I run it and press Ctrl-Z, nothing happens. I. e. the program doesn’t stop. And then I’m unable to kill it using Ctrl-C or even by
killall b. The only thing that works iskillall -9 b. Is this bug you are talking about? Or I found some different bug?Yes, that does look like the bug!
Thank you for answer!
I just did some more experiments and it seems that the bug is reproducible with Linux 6.1, but not reproducible with Linux 6.12. I will do more experiments and will report my findings in this thread
Hmm, strange – I just managed to repro it locally with Linux sunshowers-linux 6.12.8-201.bazzite.fc41.x86_64, inside an Ubuntu 24.04 distrobox and with
NEXTEST_DOUBLE_SPAWN=0 cargo nextest runon https://github.com/clap-rs/clap.Oops, I just reproduced the bug in Linux 6.12, too.
Okay, so the bug is totally legit. I reported it to glibc with steps for reproducing: https://sourceware.org/bugzilla/show_bug.cgi?id=32565
And to musl: https://www.openwall.com/lists/musl/2025/01/17/1
Just out of curiosity, could you use clone() + exec() directly instead of the double fork to work around this? If I’m reading this correctly, it’s mostly the fault of
__spawniusingCLONE_VFORK, which explicitly puts the parent in an uninterruptible sleep, so I’d imagine that DIY-ing to not use that flag would get around the issue without needing the double exec.(On the other hand, this is quite a bit of extra system-dependent code if you’re using clone() directly, and I don’t exactly recall if a vanilla fork() is slower due to passing extra flags internally…
It’s been a while, but I think I had imagined that clone-without-vfork would have the same performance profile as fork/exec, which is much worse than double-spawn. But it’s not something I verified.
Double-spawn meant I could use posix_spawn via the Rust standard library, relying on std::process::Command.
Thanks for measuring!
Yup! The galaxy-brain version of that advice is what prenosco folks are doing — use one integration tests crate for the entire workspace, so that there’s just a single binary for all the tests! This also avoids each crate being compiled with and without test cfg.
There are some real issues with that in larger workspaces with many output artifacts — you have to wait too long for the build to happen in unrelated parts of the codebase.
Would Mypy have been of any usage with some typestubs? E.g., that qb was meant to be of Type[SomeQueryBuilderInterface] which shouldn’t have an engine field?
I was thinking the same: Any statically typed language would disallow modifying types at runtime. By definition. But does Mypy care? Let’s try:
mypy: Success: no issues found in 1 source fileruff:pylint | grep -Ev 'docstring|snake_case|Too few public methods': Attribute ‘engine’ defined outside __init__ (attribute-defined-outside-init)Thankfully, pylint cares, but it also cares about things I find rather normal.
My takeaway from the article: As someone not used to thinking in dynamic types and concurrency in the same world: Oh boy, this is also a thing to check for.
Mypy does have an error code for this: attr-defined
easiest way to get it to show up is passing
--strictto mypyThis is because mypy doesn’t check functions that don’t have type annotations on them by default. You can change that via various flags, which I think should be included in the
--strictmentioned in a sibling comment.That’s just not true, the “by definition” – even Java and C++ allow modifying types at run-time.
How do you modify a type at runtime in C++?
Classes in both Java and C++ support run-time mutable static fields
That only allows you to modify a value at runtime, not modify a type. You can’t add a previously non existing static field to a type at runtime.
Oh, I understand where you are drawing the line now. You mean something like adding new behavior
I was about to ask the same. Maybe some would count casting and polymorphism, but I wouldn’t.
That’s a question of definition, but my answer is that you can’t change types at runtime, because types don’t exist at runtime. Wikipedia is foggy as usual on what static typing precisely is, but I think the meaningful distinction is that the type of each variable is static. That’s what it must mean if C++ is statically typed. In other words, it can’t be relevant that you can violate typing as much as you want at runtime by pointing at or interposing objects of different types, and that the language even helps you do that in safe forms (OOP).
I don’t think this is an accurate representation of the original code. This is constructing a new object of type
QueryBuilder. If a new object was being constructed each time then there would not have been an issue. But the code was doing the equivalent ofqb = QueryBuilder, notqb = QueryBuilder(). So instead of creating a new object and then mutating that new object, we’re instead mutating theQueryBuilderclass itself.If I use this code:
Then mypy complains with:
which is what we want to happen.
You need either the
-> Noneannotation or the--strictcommand line option to get mypy to kick in here. Otherwise it will treat the entiremainfunction as something that’s not yet meant to have type checking applied.Probably, but whoever wrote it probably wrote it this way intentionally. I don’t know why, but I am guessing it was because
qb.enginewas a better developer-facing API thanqb_engineseparately attached to the local namespace. Alas, better design was to not have this “global state” at all, which we eventually removed.Reminds me of stroll.
That’s cool. I think shake also had some ability to instrument for changes.
Yeah I dived a little deeper/dusted off the cobwebs and stroll uses fsatrace which is what shake uses and has overlapping authors with shake. Shake’s capabilities like this were apparently inspired by fabricate
Now this is some pretty good proof that you can do systems programming with a high level garbage collected language. A pure Scheme PID 1, how cool is that!
You can do it. The problem from my perspective is that it’s hard to make sure it’s allocation safe. In some languages/implementations (and guile probably fits this description) it’s pretty hard to tell when a memory allocation is even going to happen, so guarding against it failing is particularly difficult, if it’s even possible at all.
Now the argument may be made that memory allocation failure can be handled by simply terminating the process, and that’s perhaps fine in some circumstances (I personally think it’s still open to debate, and for systems software applies even less than it does for application software, but anyway) but if the process in question is init, terminating it brings the whole system down (forcibly: on Linux at least, it causes a kernel panic).
(So basically if you’re happy with an out-of-memory situation causing the system to crash, then it’s ok. Maybe you never see those anyway. There are also ways to limit its likelihood: you can use cgroups to prevent other processes from consuming all the memory, but it’s tricky to get right and it’s hard to know how much memory you have to reserve for your init. For an important server, you’d probably rather use an init that won’t cause a panic if it can’t complete a memory allocation. It’s still a pretty bad situation, but it might not be as bad as bringing the machine down).
Can you explain more how this handling of out of memory should be done? My understanding is that in most Linux distros malloc will never fail, since the OS just happily over commits virtual memory. Can you switch that off for PID 1?
With overcommit on, there are still some ways that malloc can fail even with overcommit though they are usually unlikely to be encountered, it’s true. There’s no actual switch, but on a critical system you really don’t want overcommit enabled, because memory can still be exhausted and something’s got to give - in practice what happens is that the kernel (or a userspace helper) have to decide on a process to kill in order to free up memory, so you’re back to square one (i..e it’s basically the same as the process terminating when malloc fails, although hopefully the OOM-killer doesn’t target PID 1).
Let’s find out. Against my better half’s wishes (who Googled it and found it immediately themselves), I’m going to try asking a local LLM (ht @simonw; asking Llama 3.3). It possibly semi-hallucinated part of the code, but it might just be older code. It did point me to the exact file and give me the right names for things:
Here’s the current
oom_unkillable_taskdefinition:The logic as it is today (and may or may not have always been) is that
oom_badnessreturnsLONG_MINfor “unkillable” tasks, and likewiseoom_evaluate_taskreturns early on unkillables.is_global_initis defined ininclude/linux/pid.h:Reasons malloc can fail when overcommit is enabled include hitting rlimit_data or rlimit_as.
Also address space exhaustion/fragmentation which is much less likely in our 64bit world but you can still see it happen with a bug that leads to calling malloc with a negative number.
I don’t think rlimit_data or rlimit_as are useful when you have overcommit on, and they default to infinity on Linux.
Probably a mistake for a user to set rlimits on their init process anyway.
Yeah I’m not talking about init specifically, this is a tangent, just the specific conflation (the myth? can I call it a myth?) of malloc never failing with overcommit enabled.
As a human I felt pretty good about the descriptions in 1-3, then 4 is somewhat unclear and 5 seemed to jump off a cliff of understanding for me.
I don’t know what “scuffed” is supposed to mean or why they are parallel universes. I think it’s there’s an additional timeline (the global timeline) in addition to the threads individual timelines but if that’s correct I think we’re already in parallel “universes” with multiple threads? Maybe it’s because “sync up” is a little vague in the previous sections?
“Scuffed” is gamer-originated slang for a gradient of qualities. Something that is “scuffed” is something that:
Tone generally ranges from self-deprecation, questioning skepticism, and friendly derision. True haters are more likely to call something “trash” than “scuffed”.
“Scuffed” is slang for off-brand/“poor man’s X” (EDIT: better described by KOMON). “Parallel universes” is a reference to SM64 speedrun slang that became a meme in some programming communities. I think you correctly understood the “additional global timeline” concept; Just wanted to have a bit of fun here.
The fence section (5) was tough as I feel it didn’t sufficiently escape the “formal reasoning” terms from the cppref, especially for SeqCst fence. It’s mostly due to fences being so loosely defined (they “sync up” with (potentially many) atomic read/writes that happen before or after them.. like what?). If you’re interested, I helped co-author the proposal to remove atomic fences from Zig (ignore downvotes, it’s an inside joke) — It covers the practical scenarios where they’re used and how to replace them.
This is a falsehood some people in the intersection of the data oriented design and C crowd love to sell. RAII works fine with batches, it’s just the RAII object is the batch instead of the elements inside. Even if the individual elements have destructors, if you have an alternative implementation for batches C++ has all the tools to avoid the automatic destructor calls, just placement new into a char buffer and then you can run whatever batch logic instead. Don’t try bringing this up with any of them though or you’ll get an instant block.
I do highly performance sensitive latency work with tighter deadlines than gamedev and still use destructors for all the long lived object cleanup. For objects that are churned aggressively avoiding destructor calls isn’t any different than avoiding any other method call.
Agreed, this post is making some wild claims that don’t hold up in my experience. I’m writing a high-performance compiler in Rust, and most state exists as plain-old datatypes in re-usable memory arenas that are freed at the end of execution. RAII is not involved in the hot phase of the compiler. Neither are any smart pointers or linked lists.
I simply find the argument unconvincing. Visual Studio has performance problems related to destructors => RAII causes slow software?
Agreed. I like Zig and appreciate Loris’ work, but I don’t understand this argument as well.
“Exists in tension” seems accurate to me. Yes, you can do batches with RAII, but in practice RAII languages lead to ecosystems and conventions that make it difficult. The majority of Rust crates use standard library containers and provide no fine grained control over their allocation. You could imagine a Rust where allocators were always passed around, but RAII would still constrain things because batching to change deallocation patterns would require changing types. I think the flexibility (and pitfalls) of Zig’s comptime duck typing vs. Rust traits is sort of analogous to the situation with no RAII vs. RAII.
I think it’s the case that library interfaces tend not to hand control of allocations to the caller but I think that’s because there’s almost never pressure to do so. When I’ve wanted this I’ve just forked or submitted patches to allow me to do so and it’s been pretty trivial.
Similarly, most libraries that use a HashMap do not expose a way to pick the hash algorithm. This is a bummer because I expect the use of siphash to cause way more performance problems than deallocations. And so I just submit PRs.
Yes. I write Zig every day, and yet it feels like a big miss, and, idk, populist? “But don’t just take my word for it.” Feels like too much trying to do ‘convincing’ as opposed to elucidating something neat. (But I guess this is kind of the entire sphere it’s written in; what does the “Rust/Linux Drama” need? Clearly, another contender!)
To be fair, invalidating this specific argument against RAII does not invalidate the entire post.
You write Zig every day? What kind of program are you working on?
It doesn’t, but without it I don’t really see the post offering anything other than contention for the sake of marketing.
I spend somewhere between 2 to 8 hours a day working on my own projects. (“2” on days I also do paid work, but that’s only two days a week.) Zig has been my language of choice for four or five years now; you can see a list on my GitHub profile. A lot of my recent work with it is private.
Impressive commitment to Zig! Thanks for sharing.
Thank you! I really like it, and I’m a little sad that Rust — which I still use often, maintain FOSS software in, and advocate for happily! — has narrowed the conversation around lower-level general-purpose programming languages in a direction where many now reject out of hand anything without language-enforced memory safety. It’s a really nice thing to have, and Rust is often a great choice, but I don’t love how dogmatic the discourse can be at the expense of new ideas and ways of thinking.
I very much agree. A Zig program written in a data-oriented programming style, where most objects are referenced using indices into large arrays (potentially associated to a generation number) should be mostly memory safe. But I haven’t written enough Zig to confirm this intuition.
I don’t remember the arguments against RAII much (has been a few years since) but that Zig doesn’t have RAII feels like an odd omission given the rest of the language design. It’s somewhat puzzling to me.
Hm, it’s pretty clear RAII goes against the design of Zig. It could be argued that it’d be a good tradeoff still, but it definitely goes against the grain.
__deinit__function if available” would the the sole place where that sort of thing would be happeningdefer fits Zig very well, RAII not at all.
I was unaware that Zig discourages holding on to the allocator. I did not spend enough time with Zig but for instance if you have an
ArrayListyou can defer.deinit()and it will work just fine. So I was assuming that this pattern:Could be turned into something more implicit like
I understand that “hidden control flow” is something that zig advertises itself against, but at the end of the day
deferis already something that makes this slightly harder to understand. I do understand that this is something that the language opted against but it still feels odd to me that no real attempt was made (seemingly?) to avoiddefer.But it very much sounds like that this pattern is on the way out anyways.
Zig’s
std.HashMapfamily stores a per-collection allocator inside the struct that is passed in exactly once through theinitmethod. Idk how that can be considered non-idiomatic if it’s part of the standard library.It is getting removed! https://github.com/ziglang/zig/pull/22087
Zig is a pre 1.0 language. Code in stdlib is not necessary idiomatic both because there’s still idiom churn, and because it was not uniformly audited for code quality.
As someone who doesn’t use Zig or follow it closely, both the fact that that change is being made and the reason behind it are really interesting. Thanks for sharing it here
You might also like https://matklad.github.io/2020/12/28/csdi.html then, as a generalization of what’s happening with Zig collections.
That’s an interesting development. Thanks for informing me!
It would completely change the design of the language and its approach to memory and resource management.
I’ve never used placement
new, so I don’t know about that, so my question is, how do you do that? Take for instance a simple case where I need a destructor:If I have a bunch of elements that are both constructed at the same time, then later destroyed at the same time, I can imagine having a dedicated
Element_listclass for this, but never having used placementnew, I don’t know right now how I would batch the allocations and deallocations.And what if my elements are constructed at different times, but then later destroyed at the same time? How could we make that work?
I think I have an idea about their perspective. I’ve never done Rust, but I do have about 15 years of C++ experience. Not once in my career have I seen a placement
new. Not in my own code, not in my colleagues’ code, not in any code I have ever looked at. I know it’s a thing when someone mentions it, but that’s about it. As far as I am concerned it’s just one of the many obscure corners of C++. Now imagine you go to someone like me, and tell them to “just placement new” like it’s a beginner technique everyone ought to have learned in their first year of C++.I don’t expect this to go down very well, especially if you start calling out skill issues explicitly.
I’m a little bit surprised, because I’ve had the opposite experience. Systems programming in C++ uses placement new all of the time, because it’s the way that you integrate with custom allocators.
In C++, there are four steps to creating and destroying an object:
When you use the default
newordeleteoperators, you’re doing two of these: first calling the globalnew, which returns a pointer to some memory (or throws an exception if allocation fails) and then calling the constructor, then calling the destructor. Bothnewanddeleteare simply operators that can be overloaded, so you can provide your own, either globally, globally for some overload, or per class.Placement new has weird syntax, but is conceptually simple. When you do
new SomeClass(...), you’re actually writingnew ({arguments to new}) SomeClass({arguments to SomeClass's constructor}). You can overloadnewbased on the types of the arguments passed to it. Placement new is a special variant that takes avoid*and doesn’t do anything (it’s the identity function). When you donew (somePointer) SomeClass(Args...), wheresomePointeris an existing allocation, the placement new simply returnssomePointer. It’s up to you to ensure that you have space here.If you want to allocate memory with
mallocin C++ and construct an object in it, you’d write something like this (not exactly like this, because this will leak memory if the constructor throws):This separates the allocation and construction: you’re calling
mallocto allocate the object and then calling placement new to call the constructor and change the type of the underlying memory toT.Similarly, you can separate the destruction and deallocation like this (same exception-safety warning applies):
In your example,
std::unique_ptrhas a destructor that callsdelete. This may be the global delete, or it may be somedeleteprovided byFoo,Bar, orBaz.If you’re doing placement new, you can still use
std::unique_ptr, but you must pass a custom deleter. This can call the destructor but not reclaim the memory. For example, you could allocate space for all three of the objects in your ‘object’ with a single allocation and use a custom deleter that didn’t free the memory instd::unique_ptr.Most of the standard collection types take an allocator as a template argument, which makes it possible to abstract over these things, in theory (in practice, the allocator APIs are not well designed).
LLVM does arena allocation by providing making some classes constructors private and exposing them with factory methods on the object that owns the memory. This does bump allocation and then does placement new. You just ‘leak’ the objects created this way, they’re collected when the parent object is destroyed.
Thanks for the explanation, that helps a ton.
I’ve done very little systems programming in C++. Almost all the C++ code I have worked with was application code, and even the “system” portion hardly did any system call. Also, most C++ programs I’ve worked with would have been better of using a garbage collected language, but that wasn’t my choice.
This may explain the differences in our experiences.
Yup, that’s a very different experience. Most C++ application code I’ve seen would be better in Java, Objective-C, C#, or one of a dozen other languages. It’s a good systems language, it’s a mediocre application language.
For use in a kernel, or writing a memory allocator, GC, or language runtime, C++ is pretty nice. It’s far better than C and I think the tradeoff relative to Rust is complicated. For writing applications, it’s just about usable but very rarely the best choice. Most of the time I use C++ in userspace, I use it because Sol3 lets me easily expose things to Lua.
I think it very much also depends on the subset of C++ you’re working with, at a former job I worked on a server application that might have worked in Java with some pains (interfacing with C libs quite a bit), and in (2020?) or later it should have probably be done in Rust but it was just slightly older that Rust had gained… traction or 1.0 release. It was (or still is, probably) written in the most high-level Java-like C++ I’ve ever seen due to extensive use of Qt and smart pointers. I’m not saying we never had segfaults or memory problems but not nearly as many as I would have expected. But yeah, I think I’ve never even heard about this placement new thing (reading up now), but I’m also not calling myself a C++ programmer.
Placement new is half the story, you also need to be aware that you can invoke destructors explicitly.
A trivial example looks like
If you want to defer the construction of multiple foos but have a single allocation you can imagine
char foos_storage[sizeof(foo)*10]and looping to call the destructors. Of course you can heap allocate the storage too.However, you mostly don’t do this because if you looking for something that keeps a list of elements and uses placement new to batch allocation/deallocation that’s just
std::vector<element>.Likewise if I wanted to batch the allocation of Foo Bar and Baz in Element I probably would just make them normal members.
Each element and its members is now a single allocation and you can stick a bunch of them in a vector for more batching.
If you want to defer the initialization of the members but not the allocation you can use
std::optionalto not need to deal with the nitty gritty of placement new and explicitly calling the destructor.IME placement new comes up implementing containers and basically not much otherwise.
Note that since C++20+ you should rather use std::construct_at and std::destroy_at since these don’t require spelling the type and can be used inside constexpr contexts.
You likely use placement new every day indirectly without realizing it, it’s used by std::vector and other container implementations.
When you write
new T(arg)two things happen, the memory is allocated and the constructor runs. All placement new does is let you skip the memory allocation and instead run the constructor on memory you provide. The syntax is a little weirdnew(pointer) T(arg). But that’s it! That will create aTat the address stored inpointer, and it will return aT*pointing to the same address (but it will be aT*whereaspointerwas probablyvoid*orchar*). Without this technique, you can’t implement std::vector, because you need to be able to allocate room for an array of T without constructing the T right away since there’s a difference between size and capacity. Later to destroy the item you do the reverse, you call the destructor manuallyfoo->~T(), then deallocate the memory. When you clear a vector it runs the destructors one by one but then gives the memory back all at once with a single free/delete. If you had a type that you wanted to be able to do a sort of batch destruction on (maybe the destructor does some work that you can SIMD’ify), you’d need to make your own function and call it with the array instead of the individual destructors, then free the memory as normal.I’m not trying to call anybody out for having a skill issue, but I am calling out people who are saying it’s necessary to abandon the language to deal with one pattern without actually knowing what facilities the language provides.
What would this look like in practice? How do you avoid shooting yourself in the foot due to a custom destructor? Is there a known pattern here?
There are different ways you could do it but one way would be to have a template that you specialize for arrays of T, where the default implementation does one by one destruction and the specialization does the batch version. You could also override regular operator delete to not have an implementation to force people to remember to use a special function.
For an article that references XKCD, let me provide another: https://xkcd.com/927/
Gotta be honest, I don’t see the point:
I find it less comfortable. Current SemVer is strictly objective - if breaking, then major. Else: if affects externally-visible behaviour, minor; else patch. BreakVer requires me to make a subjective judgement about what constitutes a major breakage - which requires knowing about all my customers - which is, practically, impossible. Plus, depending on my confidence/arrogance/move-fast-and-break-things-iness, I’ll make classifications that do not match up with customers’ expectations.
That is currently true. You’ve just shifted the boundary of specificity - instead of subdividing non-breaking changes, you’re sub-dividing breaking changes. In practice, either:
…do they? Why? Genuine question.
My descriptivist take is that SemVer isn’t what the spec says, but what the tooling does with it, and how people are affected by that.
If you bump the first number, people won’t update quickly, and this may be a good thing or a bad thing, depending on what your goal is.
If you bump the other numbers you may or may not upset a lot of people and get complaints.
So you weigh the risk of getting an avalanche of complaints vs how inconvenient it would be for you if users didn’t update your software quickly, and maybe also complained it doesn’t update automatically often enough.
Is it?
As the article states, “There’s an old joke that every change is breaking for someone.” I think that the joke is true to a large extent.
Yes, it is. It doesn’t matter whether the change is breaking “for someone” - it matters that the change is breaking to a public contract. If you take a dependency on behaviour that isn’t an explicit part of the an established contract, you have no right to expect warning when it changes.
I find it jarring when a piece of software goes from version 16.213.26 to 17.0.0 just because the developers removed spacebar heating.
“breaking” is not objective. Remember Hyrum’s law. Someone is going to need you to re-enable spacebar heating if you pretend it is.
Breaking in semvar is objective. It’s defined by the spec as a change in the behavior of your public interface. If you haven’t documented your public interface, you aren’t following semvar.
While this may be strictly true, it also implies that almost no one is actually following semver, which doesn’t seem like a very productive framing to me.
Huh? When the semvar spec says “public API”, I imagine some docs you can read that list all of the functions in a package along with their type signatures and a description of their behavior. Most of the packages you use have this, no?
That’s the package’s public interface. If the package changes one of those type signatures, that’s a breaking change. If it introduces a new function, that’s not breaking. If it makes a change that violates the documentation for one of its functions, that’s a breaking change. If it makes a change to the behavior of a function that’s consistent with that function’s docs… well either that’s not a breaking change, or as is common the function was inadequately documented.
This all seems fairly unambiguous to me, excepting changes to the behavior of poorly documented functions. Am I missing something?
The example I’ve gone round and round a bunch of times with people on is: Go 1.13 introduced the ability to use underscores as grouping separators in integer literals, like
1_000_000instead of1000000.This also changed the behavior of Go’s integer-parsing functions. For example,
strconv.ParseInt()suddenly started accepting and parsing inputs with underscore characters rather than returning an error. And the Go team seem to have been aware that there were people whose code was broken by this change, which would be a problem for Go’s claim that there will never be breaking changes ever, for any reason.Generally people have argued with me that although
ParseInt()was a public function, it was somehow underspecified or ambiguously specified prior to Go 1.13 and therefore it was acceptable to clarify its behavior in Go 1.13 by suddenly changing the inputs it accepted. But this just points to the real purpose of SemVer: it’s about protecting the developer of the code from the user of the code, by giving the developer endless subjective loopholes and ways to say “sure, that change broke your code, but it’s still not technically a breaking change”. For example, any function which does not specify up-front the entire set of potential inputs it will accept and the results it will return for them is subject to theParseInt()style of “oops, we underspecified it” loophole.Ah, I get it. There are three things the Go docs for ParseInt() could say:
Reading the actual docs, I’d frankly put them in the first case: they state what the accepted integer syntax is, and give examples, and all of this makes it rather clear that underscores aren’t part of the integer syntax, any more than “thirty seven” would be.
But even if the docs weren’t clear, you don’t get to say “LOL no change is breaking change because I forgot to write docs”. That just means you’ve entered a gray area, and you should be cautious about what counts as a breaking change and your users should be cautious about not relying on too much. It should be a “meet in the middle” sort of a thing, not a “how C++ compiler writers interpret undefined behavior” sort of a thing.
tldr; I’m sorry that people are treating “incompletely documented” the same as “explicitly unspecified”, those are very different things.
Isn’t accepting “1_000” in Go source also a breaking change by the same reasoning as it would be for ParseInt? Input that used to result in an error no longer does.
Maybe in some technical sense, but people rely on both positive&negative behavior of ParseInt() (e.g. rejecting invalid user input), but generally only rely on positive Golang behavior. If “this program started to compile when it used to be an error” was considered a breaking change, every change in language behavior would be breaking.
Why do you keep calling it “semvar”? It’s “semver”, semantic versioning.
Just a typo. Past the edit window so I can’t fix it now…
What I meant was that in the real world, it’s very common for an API to be underdocumented, with the result that it’s not well-defined whether a given change breaks the API or not. Like, you can look at certain changes and say “this really seems like it breaks the API,” but the API was defined vaguely enough that it’s impossible to make any judgment like that.
You say “…excepting changes to the behavior of poorly documented functions,” but I think that’s actually a very large category, in practice :-) Obviously there are some libraries and frameworks that take defining their APIs very seriously, but I would guess that the set of libraries that use SemVer is an order of magnitude larger than the set of ones that are strict about it in this way.
Yeah, that all makes sense. I’d argue that if it’s hard to make that judgement call, the library should be conservative and bump the major version number.
Is there a versioning system you think does better in the presence of a poorly defined API?
I don’t know of one, and I suspect that carefully defining the API is a prerequisite for any versioning system to be able to give you the kind of guarantees we want.
Hyrum’s Law describes a common pathological condition of dependency relationships between software modules, it doesn’t define a de facto baseline/standard guarantee or expectation of compatibility.
That person is welcome to be upset and to ask for consideration, but they are by no means entitled to it. SemVer is about breaking the explicit, published contract of software, not about breaking any hypothetical consumer. If you take a dependency on undefined behaviour, you have no rights to complain when it changes, nor any justification to expect warning when it does.
I found that improving typing speed improved my code, not because I got the code down faster, but because I was willing to type more. When I started programming, aged 7, I typed with two fingers and had to search the keyboard to find keys. I used single-letter variable names everywhere because typing longer ones was hard. I rarely wrote comments, because writing full English sentences was hard. By the time I was an adult and I typed far faster than I ever wrote with a pen, writing a meaningful variable name cost nothing but the value when I came to read the code was immense. Cost-benefit tradeoffs are very different when the cost drops to nearly zero.
This is the first version of this argument that has sounded convincing to me. “The importance of not typing slowly”.
Absolutely, and I’m noticing this now that I’m basically relearning how to type because I assembled a split keyboard and configured it with the colemak layout. I can type about 35wpm in monkeytype with only lowercase letters and spaces at the moment, but I haven’t committed all the symbols to memory yet so typing code is very slow.
I tried doing a full day of work with the new keyboard yesterday but it didn’t go well. I found that having to actively think about where all the symbols are and making more errors while typing makes trying out different approaches much less desirable, and it constrains my ability to reason about code simply because I’m holding one more thought process in my head that wasn’t there before.
Keep with it. It’s frustrating, but worth it in the end. This is your chance to shed any bad habits you picked up the first time you learned to type.
Definitely! I’m practicing every day and I started using the new keyboard almost exclusively when I’m off work. I think the split keyboard did a lot for my motivation. I really want to switch to it because it feels very comfortable to use (especially so with colemak) and, well, it’s a shiny new toy :) And since it uses QMK firmware I can even identify uncomfortable key combos or symbol positions and iterate to improve them, which triggers the tinkerer part of my brain.
There are adults who still do this. Anyone who wants their code to be accepted and adopted by other people should absolutely learn to touch type. It’s never too late.
I’d argue that my typing speed mostly helps me in communication and research rather than coding, since I spend a lot more time thinking than I spend typing, in most of my projects. As @eterps pointed out in a different thread, that’s also an essential part of developing software, but I want to make a slightly different point.
However, touch typing is still one of the most useful computer skills I ever acquired. It certainly allows me to communicate in messaging apps effectively because the cost of typing is very low for me. After all, IRC and flamewars in FOSS channels is where I practiced touch typing most. ;)
It also helps me use different devices. My current laptop has keyboard with a British layout, which includes some design decisions beyond my comprehension, such as the pipe character (
|) only accessible via AltGr-Shift-L and a key for#/~above the Enter key. But I don’t actually care what is written on the keys because I don’t look at those keys anyway — my layout is set to US English with|where it, arguably, should be; and#,~, etc., where I expect them to be.I agree with you about the importance of written communication in talking about software design. And I, too, have a non-English (Spanish in my case) laptop and changed the layout to US English, which took a little getting used to, but was fine. …And I also stand by my assertion that there are adults who I’ve seen with my own eyes use single-letter variable names everywhere because they don‘t know how to touch type.
Funnily enough right around this time they moved to explicitly not using the same parser: https://devblogs.microsoft.com/cppblog/rebuilding-intellisense/
Games that end up in
/usr/gamessuch as those packaged in bsdgames (file listing) are frequently not GUI programs.Not that I am especially opposed to you removing that from your path regardless.
I think this idea is on to something, but I wonder, when do I larry something like an Authorization header (or possibly more ambiguous, a cookie)?
I can imagine at least two or three options:
You use tracing instrumentation as an example, I’ve definitely written some general code to slurp data out of the request object.
The problem with 1 is there is a difference between sending Authorization to api.website vs to log.website.
With 2 once they are xproc they are xproc.
Log(Request.Authorization)andSendRequest(Request)are not distinguished but you want to prevent the former and allow the latter.(Slow response, sorry!)
For internal consistency, the values of HTTP headers would have to be
xproc— they by definition leave the process (outgoing) or have come from outside the process (incoming). Your listed problem with option (2) is spot-on: the lack of distinction between logging and sending requests makes this whole thing break down! I’m tempted to suggest a split between logging, network, and files forinproc, but the real world simply isn’t that clean.Since the current approach is compile-time only (and I haven’t yet solved the Reflection issue), option (3) isn’t very easy. It’d require either separate
http.Request.Headersandhttp.Request.HeadersInprocfields that get merged during serialization or some stdlib-blessed escape hatches. The presence of those escape hatches feels like this wouldn’t be an effective solution.@lightandlight’s Haskell solution feels “the most right” as I think about it more. Since it uses function coloring to prevent leaks via a side-channel, which Ted Kaminski expands on w.r.t. Haskell’s
IO. There’s likely a similar approach that would work for Go, but it’d be far more invasive to the language.That’s an excellent point. I don’t have a clear answer in mind yet, but I’ll give it a think overnight.
I didn’t read all of this, but this bit immediately jumped out to me:
Hard disagree. It is trivial to prevent this in Rust via implementing Debug manually and hashing out
pwHash.Try it yourself
Edit: Alternatively, create a separate
Passwordstruct that doesn’t implement Debug and you’ve essentially achieved the same thing.Or do implement debug but make it output some fixed string, so that you can derive Debug on structs that contain your Password type:
Try it yourself
Yep, I totally recognize that it’s possible (+ @square_usual’s comment – sibling to this). I mention these approaches in “Custom Serialization”. The main drawback is that it’s still possible to access the
pw_hashstruct field directly and log it, write that to a file, concatenate it with some other strings, etc.It’s very possible that I’m overlooking something, but I just don’t think that’s true. At least not in Rust (or similar ML-like languages): You wrap the
UserSessionin a module and only export functionality you need outside. Example. Notice that if you uncomment the last line ofmain, you get a compile-time error, becausepw_hashis not accessible outside of theuser_sessionmodule.With the proposed system you couldn’t log
pw_hasheven within theuser_sessionmodule. It would also protect you from accidentally implementing a version offmt::Debugthat printed the actual value.There are examples like this in the post already.
(@lcapaldo’s peer comment is correct. This was already written though 😅)
Yep! That’s pretty standard domain-driven design. Expanding that one step requires reading the hash from a database via a database singleton or a repository pattern or whatever you prefer. But the fact remains that something has that value and can do something ill-advised with it :)
I see. I’ll read through your post more carefully. Thanks to you and @lcapaldo for taking the time to explain.
Honest question here: what’s so bad about printing the password hash? It’s not the password, it’s the hash (a one-way function that is hard to reverse). The whole reason is to store a password safely (rather than the actual plain text password). If you are worried about rainbow attacks, then use a better hash.
Probably nothing, to be honest! I’m channeling my inner security engineer with that, and it’s likely a bit overly-worried.
s/password hash/session token/and the general idea still applies though. We need to decrypt and validate the session token, but logging it means anyone with log access can impersonate the requesting user.I think session tokens would make a better example of the issue.
Or unhashed passwords, which the server briefly needs to deal with whenever a user logs in.
You can do this in Go too: https://nickwells.github.io/GoRecipes/formatting.html
This is also done by ruby which is where I expect Crystal got it from; there is a common ancestor in Smalltalk for ruby and Objective-C which also does the allocate/initialize split, at least in some versions. It’s possible ruby lifted it from Objective-C, allocate is the name Obj-C uses.
Again probably by way of ruby not python, ruby has called
*the splat operator for a long time (forever?).Oh I didn’t know Ruby and Smalltalk did it. Sounds like the lineage of it goes further back than I thought, thanks.
I found this to be at least persuasive guidance how to “use a combination of metrics”: Goodhart’s Law Isn’t as Useful as You Might Think.
The “using” does involve judgement, and not based on weighting X and Y to create a new combined metric. It has much more actionable guidance than just “use your judgement” though.