As of March 2022, HardenedBSD does this by default for the entire OS userland ecosystem (the OS itself and 33,000+ packages). We’ve only had to disable auto-var-init-to-zero for a small subset of packages.
Applying to the kernel will require more research, but is on the roadmap.
Can you share any details on why you’ve had to disable it for some packages? Is it performance concerns? Buggy software relying on uninitialised variables to be non-zero?
I can’t speak for u/lattera but in my experience the big carve out/exception is code that has “large” stack allocated arrays where the compiler is unable to avoid large pre-zeroing of values that will otherwise be initialized. It’s really easy to see
int a[1000]
for (int j = 0; I j< 1000; j++) a[j]=0;
but for example
int a[1000];
f(a, 1000);
Should the compiler initialize a? clang (and I assume gcc) know the existence and semantics of bzero, memset, etc so can both treat that as a call the initializes a, but if you have a bunch of code that has large amounts of on stack data that has less trivial initializing logic, then it becomes much harder (esp. in C/C++)
The thing I’d really like to see added to C++ here is a mechanism for communicating the fact that the memory is initialised to zero to the constructor. This probably doesn’t have much impact for stack allocations, but if you mmap a few MiBs of memory (guaranteed zero and lazily committed) and then placement new into it (or use mmap in a custom operator new for the class) with a large class containing an array of integers or pointers, then the compiler will synthesise a constructor that zeroes the array (slowly, causing CoW faults on all of it). If you want to use a constructor to set any of the fields, the others get default initialised. If you could have an overload of the constructor for construct into zeroed memory then that would be a huge improvement.
Can data-flow analysis help find these cases? It would be weird to have the compiler turn your constructor into two constructors, one of which assumes zero’d memory and one of which doesn’t, but it does seem the sort of thing a compiler is better at than a hunam.
The compiler could easily generate the two constructors, it just needs some mechanism for knowing which to call. The allocation function, the constructor, and the call site for both, may all be in different compilation units and so you need s9me mechanism for passing the information through this sequence.
This was implemented as an opt-in compiler flag in 2018 for LLVM [LLVMReview] and MSVC [WinKernel], and 2021 in for GCC [GCCReview]. Since then it has been deployed to:
The OS of every desktop, laptop, and smartphone that you own;
The web browser you’re using to read this paper;
Many kernel extensions and userspace program in your laptop and smartphone; and
Likely to your favorite videogame console.
Unless of course you like esoteric devices.
Appeals to authority so have their place. It shouldn’t be the only argument but is a nice way to show that you aren’t alone in your thinking
Some people think that reading uninitialized stack variables is a good source of randomness to seed a random number generator, but these people are wrong [Randomness].
I’d like to see this go through. It adds some nice correctness to the language. I think static code analysis should still warn but it is a nice fallback. Especially when C++ constructors make not initializing more normal. I think this paper presents some very clear, compelling and reliable evidence that should make this fairly easy to accept.
Appeals to authority so have their place. It shouldn’t be the only argument but is a nice way to show that you aren’t alone in your thinking
This is not an “appeal to authority”. This is very basic “supporting evidence for practicality”.
The primary argument for not initializing locals is performance. If entire OS’s (fairly performance sensitive) and browsers (about the most perf sensitive consumer facing product in widespread use. Gaming does not jit the perf metrics browser engines face) are initializing locals, then that’s a fairly good data point for “the claimed costs are not correct”.
A secondary argument I’ve seen is that leaving locals uninitialized as UB allows for runtime detection of reading uninitialized memory (I am not kidding, this is a real argument for not specifying initialization, and only very very loosely reasonable).
He’s a great (and very smart) guy, though you need to be prepared for terrible and relentless dad jokes :D
He’s obviously tremendously experienced in C/C++/compiler land, but he also has a background in the incredibly perf sensitive land of browsers (V8+JSC+webkit - I think he was JSc-land prior to the webkit/blink fork)
Zeroing of entire cache entries is magical, though for the stack this is rarely the case. There’s questions of temporality, special instructions, and special logic to support zeroed cachelines.
…now I’m wondering whether it could be useful/worth it for some languages to just allocate stack in a minimum of one cache line at a time. If your cache lines are 64(?) bytes, and 16 of them are used for your stack and return pointer no matter what, then you’re not wasting too much space, right?
Of course next my brain goes “yeah, and then you don’t need it to be an actual stack, you can just use a slab allocator for it and something something coroutines closures –” but I also haven’t had coffee yet.
Making all automatic variables explicitly zero means that developers will come to rely on it.
What if you just didn’t fucking allow reading from uninitialized local vars?
…now I’m wondering whether it could be useful/worth it for some languages to just allocate stack in a minimum of one cache line at a time
That would mean per cpu µarch builds which likely isn’t practical outside specific HPC applications.
What if you just didn’t fucking allow reading from uninitialized local vars?
A change like that would break huge amounts of existing code. The point of this change is to increase the safety of the language without breaking existing code. Specifically there exists code that does
int I;
if (a) { /* control flow */ I = something; }
/* control flow */
if (b) { /* control flow */ I = something else; }
read(i)
must now be broken at compile time (the compiler can’t determine that !condition a implies condition b, even if in the actual code this is the case).
I am aware. I’d rather my code is explicitly broken at compile time rather than silently broken next time I refactor, is all. Not like the compiler can’t get rid of the unnecessary initialization when it can prove it’s not needed. I’m well aware that C/C++ standards will never make broken code not compile, though.
The problem is not “broken code not compiling” the problem is correct code being broken:
int a.;
/* complex control flow that initializes a */
use(a);
This code is correct. It does not read uninitialized memory. It does not trigger UB. Changing the compiler to require all variable declarations include an initializer breaks the code, even when that code is correct. I get that you might prefer an explicit initializer on every local, the reality is that the above code is correct according to the C vm.
If a language change requires changing existing correct code, then that behaviour will be disabled for existing code.
There is a chronic problem in C, C++, and other unsafe languages, where new “safer” features are added to a language, which means that they do nothing to help existing code bases.
This proposal changes the semantics of uninitialized local storage from UB to specified. That does not require adoption, nor does it require any adoption steps to get the benefit.
A proposal that makes initializing everything mandatory would not be better for language security, as compilers would be unlikely to be able to make it a default behavior, and if they did a lot of large projects would disable this.
I want to be clear: I am not saying “forced initialization is bad”, I am saying specifically that a language change that introduces it doesn’t help any existing code.
As of March 2022, HardenedBSD does this by default for the entire OS userland ecosystem (the OS itself and 33,000+ packages). We’ve only had to disable auto-var-init-to-zero for a small subset of packages.
Applying to the kernel will require more research, but is on the roadmap.
Can you share any details on why you’ve had to disable it for some packages? Is it performance concerns? Buggy software relying on uninitialised variables to be non-zero?
I can’t speak for u/lattera but in my experience the big carve out/exception is code that has “large” stack allocated arrays where the compiler is unable to avoid large pre-zeroing of values that will otherwise be initialized. It’s really easy to see
but for example
int a[1000]; f(a, 1000);
Should the compiler initialize
a
? clang (and I assume gcc) know the existence and semantics of bzero, memset, etc so can both treat that as a call the initializesa
, but if you have a bunch of code that has large amounts of on stack data that has less trivial initializing logic, then it becomes much harder (esp. in C/C++)The thing I’d really like to see added to C++ here is a mechanism for communicating the fact that the memory is initialised to zero to the constructor. This probably doesn’t have much impact for stack allocations, but if you mmap a few MiBs of memory (guaranteed zero and lazily committed) and then placement new into it (or use mmap in a custom operator new for the class) with a large class containing an array of integers or pointers, then the compiler will synthesise a constructor that zeroes the array (slowly, causing CoW faults on all of it). If you want to use a constructor to set any of the fields, the others get default initialised. If you could have an overload of the constructor for construct into zeroed memory then that would be a huge improvement.
Can data-flow analysis help find these cases? It would be weird to have the compiler turn your constructor into two constructors, one of which assumes zero’d memory and one of which doesn’t, but it does seem the sort of thing a compiler is better at than a hunam.
The compiler could easily generate the two constructors, it just needs some mechanism for knowing which to call. The allocation function, the constructor, and the call site for both, may all be in different compilation units and so you need s9me mechanism for passing the information through this sequence.
Aha, that makes sense. Thanks.
Quite an enjoyable paper to read.
Appeals to authority so have their place. It shouldn’t be the only argument but is a nice way to show that you aren’t alone in your thinking
I’d like to see this go through. It adds some nice correctness to the language. I think static code analysis should still warn but it is a nice fallback. Especially when C++ constructors make not initializing more normal. I think this paper presents some very clear, compelling and reliable evidence that should make this fairly easy to accept.
This is not an “appeal to authority”. This is very basic “supporting evidence for practicality”.
The primary argument for not initializing locals is performance. If entire OS’s (fairly performance sensitive) and browsers (about the most perf sensitive consumer facing product in widespread use. Gaming does not jit the perf metrics browser engines face) are initializing locals, then that’s a fairly good data point for “the claimed costs are not correct”. A secondary argument I’ve seen is that leaving locals uninitialized as UB allows for runtime detection of reading uninitialized memory (I am not kidding, this is a real argument for not specifying initialization, and only very very loosely reasonable).
What a joy to read JF Bastian’s writing style. Friendly, informative, and cheeky.
He’s a great (and very smart) guy, though you need to be prepared for terrible and relentless dad jokes :D
He’s obviously tremendously experienced in C/C++/compiler land, but he also has a background in the incredibly perf sensitive land of browsers (V8+JSC+webkit - I think he was JSc-land prior to the webkit/blink fork)
Glad to see more progress here and that there is an escape hatch if needed.
…now I’m wondering whether it could be useful/worth it for some languages to just allocate stack in a minimum of one cache line at a time. If your cache lines are 64(?) bytes, and 16 of them are used for your stack and return pointer no matter what, then you’re not wasting too much space, right?
Of course next my brain goes “yeah, and then you don’t need it to be an actual stack, you can just use a slab allocator for it and something something coroutines closures –” but I also haven’t had coffee yet.
What if you just didn’t fucking allow reading from uninitialized local vars?
That would mean per cpu µarch builds which likely isn’t practical outside specific HPC applications.
A change like that would break huge amounts of existing code. The point of this change is to increase the safety of the language without breaking existing code. Specifically there exists code that does
must now be broken at compile time (the compiler can’t determine that !condition a implies condition b, even if in the actual code this is the case).
I am aware. I’d rather my code is explicitly broken at compile time rather than silently broken next time I refactor, is all. Not like the compiler can’t get rid of the unnecessary initialization when it can prove it’s not needed. I’m well aware that C/C++ standards will never make broken code not compile, though.
The problem is not “broken code not compiling” the problem is correct code being broken:
This code is correct. It does not read uninitialized memory. It does not trigger UB. Changing the compiler to require all variable declarations include an initializer breaks the code, even when that code is correct. I get that you might prefer an explicit initializer on every local, the reality is that the above code is correct according to the C vm.
Until you need to change the complex control flow, yes.
Yes, but that isn’t the point.
If a language change requires changing existing correct code, then that behaviour will be disabled for existing code.
There is a chronic problem in C, C++, and other unsafe languages, where new “safer” features are added to a language, which means that they do nothing to help existing code bases.
This proposal changes the semantics of uninitialized local storage from UB to specified. That does not require adoption, nor does it require any adoption steps to get the benefit.
A proposal that makes initializing everything mandatory would not be better for language security, as compilers would be unlikely to be able to make it a default behavior, and if they did a lot of large projects would disable this.
I want to be clear: I am not saying “forced initialization is bad”, I am saying specifically that a language change that introduces it doesn’t help any existing code.