Once, I had a colleague come to see me with “a segfault”.
As usual I started debugging it in valgrind and ASAN/UBSAN.
I’m a bit fuzzy on the details, but one of the sanitizers would mysteriously not exhibit the issue, while the other would crash with a very unhelpful message. As for valgrind, it would itself segfault when ran on the code.
The debugger was looking very confused as to the size of an object.
Turned out that my colleague had closed an anonymous namespace too early, in two different files.
He had meant to write:
namespace {
// ... some other stuff ...
struct InternalDetails {
SomeType _field_1;
}
}
but he had written:
namespace {
// ... some other stuff
}
struct InternalDetails {
SomeType _field_2;
}
Another file with the same mistake would declare another InternalDetails struct with a different layout.
That a single typo would lose us as much time as 2 afternoons of 2 engineers with no tool able to detect it was infuriating to say the least.
I didn’t know that IFNDR even existed before this, and it makes me fucking angry. These are things that are just straight-up wrong. These are programs that are just plain incorrect, and they’re incorrect in ways that a compiler should be able to catch! Why is the compiler allowed to not catch them?! Because it’s hard. It’s hard to notice two inline functions in different compilation units can step on each other. It’s hard to notice two different compilation units are using different definitions of a type that should be the same.
You know what? Too bad. Are you or are you not literally building the tools that the world runs on? I can think of two different ways to solve each of these problems, and I’m the proverbial hobbyist schlub working out of their garage. Why is it hard to fix these things? It’s hard because the C/C++ compilation model sucks ass. Why don’t you fix it? Oh, that would be hard too? I thought we were supposed to be smart people who were good at this. Why can’t you think of good ways to fix this? In reality, the sugar-daddy tech corporations who pay the toolsmiths don’t give a shit about making something good, they care about “will this pay back the cost of making it within 18 months?” And they aren’t gonna measure how much it costs to have programmers semi-routinely write miscompiled programs very well because, you know, it’s hard!
This is exactly why Rust is such a joy to use. If someone says “X is incorrect” the response from the core team is generally “we should fix that”, followed by “how do we fix that?” and occasionally “do we have to wait for an edition boundary to fix that?” Naturally, often the answer is “we don’t know how to fix that yet”, but at least they’re fucking trying. Oldest open unsound lang issue on the issue tracker right now? Hash collisions in TypeId. Open since 2013. I like this one because I’ve run into it personally. Most recent activity? Last week, from core lang developer RalfJung fucking figuring out how to fix it.
You know what? Too bad. Are you or are you not literally building the tools that the world runs on?
I wish this was the stance that the C and C++ committees actually took. But in reality, these standards committees are not about holding implementations to a standard, they are about lowering the standard until just about every implementation clears the bar.
This is why easily checkable stuff like having an unmatched quote in C source code is still undefined behavior in C23 (yes, really, see J.2 (26)).
This is why every other useful C feature is declared optional by the standard, dooming user code everywhere to endless mazes of ifdefs checking feature macros.
This is why realloc(_, 0) is newly specified to be UB in C23.
This is why, when it was discovered that compilers targeting ARMv7 and POWER were compiling atomic operations in a way that didn’t uphold the rules of the memory model, and that fixing it would make programs slower, they decided to not fix the compilers but instead weaken the memory model in C++21.
This is why, when Nvidia started shipping a C++ implementation on which threads would sometimes just never get scheduled at all, they introduced a whole new zoo of different forward progress guarantees, and now std::thread is only “encouraged, but now required” to provide concurrent forward progress, i.e. the previous guarantee.
You should read these standards not as “here’s what you need to do if you want to be a C/C++ implementation” but as “here’s the lowest common denominator of things implemented by all the C/C++ vendors serious enough to send somebody with a mouth to WG14/WG21”.
In C and C++ where there’s a degree of separation between the specification and implementation, if you promote ODR violations to Ill-Formed then a whole lot of implementations become noncompliant, because they are never going to implement the whole-program-analysis needed to solve this (and then only within a single DSO; what do you do across DSO boundaries?). This probably also has regulatory and certification implications.
There are diagnostics with various levels of success for ODR violations in some implementations, so I wouldn’t say the implementations are not trying.
Not trying to detract from your rant, I’ve been burnt by this myself multiple times. Just saying it’s a tight rope to walk.
Have each translation unit export its public API into a file and have the build system cache it. IE, generate header files off of metadata in the implementation file. Two struct definitions conflict with each other? Two inline function definitions conflict with each other? Ifdef’s have two different values in different compilation units? Now you can catch it, without needing more state for whole program compilation other than what a header file would provide anyway. You just now know that state is actually consistent.
I bet this would also make every build system in existence faster as well, except maybe Ninja.
Mm, yup, burned an afternoon at $WORK just a few weeks ago because the ABI of Google’s abseil library changes according to whether you’re compiling with address sanitizer enabled or not. I compiled the library without, and my own code with. LLVM trunk has the same problem: https://github.com/llvm/llvm-project/issues/83566.
Also, NDEBUG can inflict a lot of fun.
Story time:
Once, I had a colleague come to see me with “a segfault”.
As usual I started debugging it in valgrind and ASAN/UBSAN.
I’m a bit fuzzy on the details, but one of the sanitizers would mysteriously not exhibit the issue, while the other would crash with a very unhelpful message. As for valgrind, it would itself segfault when ran on the code.
The debugger was looking very confused as to the size of an object.
Turned out that my colleague had closed an anonymous namespace too early, in two different files.
He had meant to write:
but he had written:
Another file with the same mistake would declare another
InternalDetailsstruct with a different layout.That a single typo would lose us as much time as 2 afternoons of 2 engineers with no tool able to detect it was infuriating to say the least.
Maybe smol rant time:
I didn’t know that IFNDR even existed before this, and it makes me fucking angry. These are things that are just straight-up wrong. These are programs that are just plain incorrect, and they’re incorrect in ways that a compiler should be able to catch! Why is the compiler allowed to not catch them?! Because it’s hard. It’s hard to notice two inline functions in different compilation units can step on each other. It’s hard to notice two different compilation units are using different definitions of a type that should be the same.
You know what? Too bad. Are you or are you not literally building the tools that the world runs on? I can think of two different ways to solve each of these problems, and I’m the proverbial hobbyist schlub working out of their garage. Why is it hard to fix these things? It’s hard because the C/C++ compilation model sucks ass. Why don’t you fix it? Oh, that would be hard too? I thought we were supposed to be smart people who were good at this. Why can’t you think of good ways to fix this? In reality, the sugar-daddy tech corporations who pay the toolsmiths don’t give a shit about making something good, they care about “will this pay back the cost of making it within 18 months?” And they aren’t gonna measure how much it costs to have programmers semi-routinely write miscompiled programs very well because, you know, it’s hard!
This is exactly why Rust is such a joy to use. If someone says “X is incorrect” the response from the core team is generally “we should fix that”, followed by “how do we fix that?” and occasionally “do we have to wait for an edition boundary to fix that?” Naturally, often the answer is “we don’t know how to fix that yet”, but at least they’re fucking trying. Oldest open unsound lang issue on the issue tracker right now? Hash collisions in
TypeId. Open since 2013. I like this one because I’ve run into it personally. Most recent activity? Last week, from core lang developer RalfJung fucking figuring out how to fix it.I wish this was the stance that the C and C++ committees actually took. But in reality, these standards committees are not about holding implementations to a standard, they are about lowering the standard until just about every implementation clears the bar.
This is why easily checkable stuff like having an unmatched quote in C source code is still undefined behavior in C23 (yes, really, see J.2 (26)).
This is why every other useful C feature is declared optional by the standard, dooming user code everywhere to endless mazes of ifdefs checking feature macros.
This is why
realloc(_, 0)is newly specified to be UB in C23.This is why stdio is allowed to silently eat half of all your bytes on weird platforms and stay arguably compliant.
This is why, when it was discovered that compilers targeting ARMv7 and POWER were compiling atomic operations in a way that didn’t uphold the rules of the memory model, and that fixing it would make programs slower, they decided to not fix the compilers but instead weaken the memory model in C++21.
This is why, when Nvidia started shipping a C++ implementation on which threads would sometimes just never get scheduled at all, they introduced a whole new zoo of different forward progress guarantees, and now
std::threadis only “encouraged, but now required” to provide concurrent forward progress, i.e. the previous guarantee.You should read these standards not as “here’s what you need to do if you want to be a C/C++ implementation” but as “here’s the lowest common denominator of things implemented by all the C/C++ vendors serious enough to send somebody with a mouth to WG14/WG21”.
In C and C++ where there’s a degree of separation between the specification and implementation, if you promote ODR violations to Ill-Formed then a whole lot of implementations become noncompliant, because they are never going to implement the whole-program-analysis needed to solve this (and then only within a single DSO; what do you do across DSO boundaries?). This probably also has regulatory and certification implications.
There are diagnostics with various levels of success for ODR violations in some implementations, so I wouldn’t say the implementations are not trying.
Not trying to detract from your rant, I’ve been burnt by this myself multiple times. Just saying it’s a tight rope to walk.
Yeah, I honestly do appreciate the tightrope, and most implementations do better than the baseline. But still:
Stop using those implementations. Especially in the places where regulatory and certification implications matter!!
Build times are notoriously slow. Adding required analysis that means looking across translation units would make the process even more painful.
Have each translation unit export its public API into a file and have the build system cache it. IE, generate header files off of metadata in the implementation file. Two struct definitions conflict with each other? Two inline function definitions conflict with each other? Ifdef’s have two different values in different compilation units? Now you can catch it, without needing more state for whole program compilation other than what a header file would provide anyway. You just now know that state is actually consistent.
I bet this would also make every build system in existence faster as well, except maybe Ninja.
Mm, yup, burned an afternoon at $WORK just a few weeks ago because the ABI of Google’s abseil library changes according to whether you’re compiling with address sanitizer enabled or not. I compiled the library without, and my own code with. LLVM trunk has the same problem: https://github.com/llvm/llvm-project/issues/83566. Also, NDEBUG can inflict a lot of fun.