The first case (newtypes and specialization) is actually pretty interesting, because even the “bad” case compiles to a memset, which most would consider “about as fast as feasible”. See the code here: https://godbolt.org/z/rxxhMGjr6
The trick is that the “good” case is better than you would expect, because it calls __rust_alloc_zeroed, which as described has special handling by the compiler (or rather LLVM, which handles calls to calloc and such similarly, based on the comments in the rustc source). If you change the allocation to vec![1u8; ...] then it compiles to a memset instead and runs at the same speed as vec![WrappedByte(0); ...].
But we gotta go deeper (cue Inception sound sting): how the heck does allocating zero’ed memory run 1,000,000x faster than memset? This is hard to find out for certain, but I think the answer is “kernel trickery”: according to ye olde Stack Overflow, the kernel doesn’t actually zero the memory on allocation, but rather on first read. So it just hands the program a bunch of memory pages and says “these are all zero’ed, I promise”. Then when the program tries to actually read from them, it causes a page fault and the kernel says “oh uh, hold up one sec”, zero’s the memory, and then lets the program go about its business. On the other hand if the program writes to the memory before ever reading from it, the kernel never has to do the extra work of zero’ing them when it will never matter.
I think Rust needs a better concept of what types have 0 as a valid value. I suspect it would make a lot of the cases where people use uninitialized memory for performance reasons a lot safer.
On the other hand if the program writes to the memory before ever reading from it, the kernel never has to do the extra work of zero’ing them when it will never matter.
I don’t thinks this works out; all the kernel sees is an attempted write to a small part of the relevant page, leaving the rest of the page still all supposedly zero. At this point the kernel has to provide a writable page, and it has to zero the rest of that page, since it won’t get a page fault on the next access if it marks it writable.
There are other reasons why doing this lazily is advantageous though. In particular:
If some subset of the requested (virtual) pages are read but not written, the kernel can just map the same all-zero (physical) page to each of them, saving physical memory. It only needs to allocate separate pages on write.
It’s common for programs to allocate a big slab of memory up front and not actually use all of it, so this can avoid doing work at all.
Even if the whole thing is indeed written to up-front, I suspect doing the page allocation lazily would improve performance, as otherwise you’d be doing it in two separate passes, hurting locality.
There already exists a trait, IsZero, that tells when the memory representation of a value is zero, 0 the trait is not public so it is not possible to implement for an arbitrary type.
That trait then again is used for the specialization such that it knows it can use calloc instead of malloc+memset. 1
It is possible to make the WrappedByte example use calloc by consuming it as an iterator at which point I assume llvm can see that it is mapped with an identity function and can optimize it out 2
Addendum: You can see this in action by changing the allocation to let v: Vec<u8> = Vec::with_capacity(1<<34);, which allocates uninitialized memory. It runs about as quickly as the vec![0u8; 1<<34] case.
Should this optimization be valid? In safe Rust, there’s no question that it is. However, safe Rust has to coexist with unsafe Rust that may be doing all sorts of subtle pointer tricks under the hood. If bar is defined elsewhere, the compiler does not know what its definition looks like (and whether it internally uses unsafe code) and thus must treat it like a black box.
Is anyone able to expand on this or point to relevant documentation? I thought unsafe code was supposed to conform to the expectations of safe Rust when viewed from the outside. The idea of optimisations being discarded because a function call could potentially be mutating an immutable value is surprising to me.
I thought unsafe code was supposed to conform to the expectations of safe Rust when viewed from the outside.
It’s not that simple. Here is my understanding:
unsafe fn danger_foo() can cause undefined behavior. If it is written correctly and if you use it correctly, then you can use it safely.
Now, you might be asking: Since the lowest levels of std often involve unsafe, and since most things are built on top of the standard library, isn’t most Rust code unsafe? Well …not quite. You can do:
fn foo() {
unsafe { danger_foo() }
}
By doing this, you’re effectively saying, “I’ve verified that this works.” If you trust the author of foo(), you can now assume foo() is safe when you invoke it from other safe code. Hooray. Safety.
That last caveat is important. You can only trust foo() from safe code. Unsafe Rust needs to be careful trusting even safe code. This is explained in the nomicon:
The design of the safe/unsafe split means that there is an asymmetric trust relationship between Safe and Unsafe Rust. Safe Rust inherently has to trust that any Unsafe Rust it touches has been written correctly. On the other hand, Unsafe Rust has to be very careful about trusting Safe Rust.
LLVM is ‘unsafe’ so you need to assume that even a safe foo() { ... } may within it contain a call to danger_foo().
In a language like Java or Python, defining wrapper types like this has a runtime cost, forcing programmers to choose between abstraction and performance.
The first case (newtypes and specialization) is actually pretty interesting, because even the “bad” case compiles to a
memset
, which most would consider “about as fast as feasible”. See the code here: https://godbolt.org/z/rxxhMGjr6The trick is that the “good” case is better than you would expect, because it calls
__rust_alloc_zeroed
, which as described has special handling by the compiler (or rather LLVM, which handles calls tocalloc
and such similarly, based on the comments in the rustc source). If you change the allocation tovec![1u8; ...]
then it compiles to amemset
instead and runs at the same speed asvec![WrappedByte(0); ...]
.But we gotta go deeper (cue Inception sound sting): how the heck does allocating zero’ed memory run 1,000,000x faster than
memset
? This is hard to find out for certain, but I think the answer is “kernel trickery”: according to ye olde Stack Overflow, the kernel doesn’t actually zero the memory on allocation, but rather on first read. So it just hands the program a bunch of memory pages and says “these are all zero’ed, I promise”. Then when the program tries to actually read from them, it causes a page fault and the kernel says “oh uh, hold up one sec”, zero’s the memory, and then lets the program go about its business. On the other hand if the program writes to the memory before ever reading from it, the kernel never has to do the extra work of zero’ing them when it will never matter.I think Rust needs a better concept of what types have 0 as a valid value. I suspect it would make a lot of the cases where people use uninitialized memory for performance reasons a lot safer.
I don’t thinks this works out; all the kernel sees is an attempted write to a small part of the relevant page, leaving the rest of the page still all supposedly zero. At this point the kernel has to provide a writable page, and it has to zero the rest of that page, since it won’t get a page fault on the next access if it marks it writable.
There are other reasons why doing this lazily is advantageous though. In particular:
There already exists a trait,
IsZero
, that tells when the memory representation of a value is zero, 0 the trait is not public so it is not possible to implement for an arbitrary type.That trait then again is used for the specialization such that it knows it can use
calloc
instead ofmalloc+memset
. 1It is possible to make the
WrappedByte
example use calloc by consuming it as an iterator at which point I assume llvm can see that it is mapped with an identity function and can optimize it out 2Addendum: You can see this in action by changing the allocation to
let v: Vec<u8> = Vec::with_capacity(1<<34);
, which allocates uninitialized memory. It runs about as quickly as thevec![0u8; 1<<34]
case.Is anyone able to expand on this or point to relevant documentation? I thought unsafe code was supposed to conform to the expectations of safe Rust when viewed from the outside. The idea of optimisations being discarded because a function call could potentially be mutating an immutable value is surprising to me.
It’s not that simple. Here is my understanding:
unsafe fn danger_foo()
can cause undefined behavior. If it is written correctly and if you use it correctly, then you can use it safely.Now, you might be asking: Since the lowest levels of std often involve
unsafe
, and since most things are built on top of the standard library, isn’t most Rust code unsafe? Well …not quite. You can do:By doing this, you’re effectively saying, “I’ve verified that this works.” If you trust the author of
foo()
, you can now assumefoo()
is safe when you invoke it from other safe code. Hooray. Safety.That last caveat is important. You can only trust
foo()
from safe code. Unsafe Rust needs to be careful trusting even safe code. This is explained in the nomicon:LLVM is ‘unsafe’ so you need to assume that even a safe
foo() { ... }
may within it contain a call todanger_foo()
.I thought I heard that this was coming to Java as well, not sure if it was https://openjdk.java.net/jeps/401 or something else.