I lead Project Safe Transmute, the effort to build transmutability APIs into the Rust standard library. In that capacity, I also co-maintain zerocopy. I also maintain Itertools, a crate where iterator adaptors have long gone to audition for inclusion into the standard library.
I worry deeply about the second order effects of pressuring maintainers to drop dependencies.
The Rust standard library is append-only, and has an extremely high bar for the inclusion of new APIs. To guide its design, the Rust Project has long relied insights from the crate ecosystem. Rust has a great standard library because Rust has great crates. And Project Safe Transmute, in particular, depends on the path blazed by zerocopy and bytemuck to inform its design.
Without such crates and their enthusiastic users, how can we effectively vet potential standard library inclusions? Only a tiny fraction of Rust programmers are willing to live on the bleeding edge of nightly.
The Rust community, on the other hand, was very dismissive on on Reddit and Lobsters.
I think you’re making a mountain out of a molehill.
The crate itself has seen a fair bit of churn: for instance 0.9 broke backwards compatibility with 0.8.
My sense of this is not very calibrated, so can someone weigh in on this? 0.8 was released more than 4 years ago, and 0.9 was released last week.
About a year ago, it looked like this:
[snip]
Not perfect, but better.
What exactly would be perfect? No dependencies at all? I mean, let’s look at what it’s actually depending on there. rand_chacha and rand_core are part of the same project, so they shouldn’t count as extra dependencies. libc is pretty foundational. getrandom is also part of rust-random and I think it’s nice that they have it separated so that people who just want random numbers from the system can do so without needing rand. cfg-if, I guess this should be part of the standard library or something? ppv-lite86 is an implementation of a cryptographic primitive, which seems fair to outsource.
None of those seem like things you’d want to remove.
There is a single-dependency crate too which can read from the system’s entropy source and that’s getrandom. So there at least could be a world where rand only depends on that.
Ok, I guess that confirms that, you think the perfect version of rand would be one that has its own implementation of ChaCha20 and which isn’t split into 3 crates. I really don’t see what difference that would make, except make the raw count of crates lower.
I didn’t bother to look at the other crates, but this one stuck out to me as seeming unlikely:
byteorder clocks in at 3,000 lines of code.
So I went and looked at the code. lib.rs is 1587 lines of tests, 1439 lines of comments, and 644 lines of code. io.rs is 1149 lines of comments and 385 lines of code. So 1029 lines of actual code that gets compiled. It’s highly repetitive, implementing the big endian and little endian read and write logic for a bunch of different types. More macros could reduce the line count, but that wouldn’t reduce the amount of code needing to be compiled nor the size of the machine code.
Maybe this points to Rust not having a large enough standard library. Perhaps features like terminal size detection and random number generation should be included. That at least is what people pointed out on Twitter.
Why would you put terminal size detection in the standard library? What? Mmmmmaybe getrandom should be included. The criteria right now is that something should be in the standard library if it has to be in the standard library, or if pretty much every program needs it. But actually doing the work takes time. Safe transmute might render many uses of zerocopy obsolete, but safe transmute isn’t done yet!
Also why would you care what people on Twitter say? Most people I’ve seen who engage in dependency discourse over there seem to be brofluencers who spend more time posting hot takes to social media than they actually spend coding.
Perhaps features like terminal size detection and random number generation should be included.
From a long-term perspective, having random number generation not in the standard library seems like the right bet. rand(3) is broken, both from an implementation quality perspective (on many, but not all, platforms), from a performance perspective (unnecessary synchronization for most applications), and from an API perspective (at least for multi-threaded and forking programs), and surprisingly weak guarantees. Those things may not have been visible to the original creators of rand, though, because they’re mostly due to newer concerns (multi-core, crypto-related stuff we’ve learned over the decades since, etc).
RNG is hard because of crypto, and simulation and other numeric stuff, and all kinds of other reasons that make it very hard to just have one good RNG that solves all problems well. It’s a deep enough problem that there’s even a ton of academic beef about the right way to do it.
Continuing from the previous discussion on library vs physical dependencies, rand is a good case study, as, I think, it mixes several distinct physical things under the same roof.
One is OS-sourced randomness, that’s a crypto::random_bytes(buffer: &mut [u8]) -> () function. That’s getrandom (and should have been added to std like five years ago).
Another is a fast PRNG for simulations: fn new(seed: u64) -> Rng, it’s a set of pure, mostly non-generic functions. This is the biggest value-add, and there’s a dozen of smaller rand variations on crates.io
The third one is a generic infrastructure for working with arbitrary PRNGs and distributions. I think this is only relevant for people doing some serious statistics? I only needed this once, as a homework on the topic of generics.
Finally, there’s the global (thread-local) rng, which I think is not a good idea outside of guessing-game sized programs?
I am a person who goes out of their way to use minimal or zero dependencies, especially transitive dependencies, but I find it difficult to articulate why. It just feels bad, which isn’t very scientific. I’d honestly rather spend 10 minutes setting up pico-args instead of 10 seconds setting up clap just to avoid the larger dependency graph, but I don’t really know why. I wasn’t planning on auditing those crates no matter how many there were, and I’m only saving ~2 seconds of cold compile time. It just feels worse somehow.
Rust compiles really fast when it’s just your own code, but then you add something to do a HTTP request and your dependency graph fills multiple terminals and takes much longer to build and link. It’s really not the end of the world, but still it makes me want to shell out to curl instead.
Obtaining entropy, for RNG seeding or cryptographic use, is something that really should be in a standard library because it’s OS functionality that varies a lot by platform (even between Unix-derived ones.)
(We just had to deal with this in a C project on my team at work — we brought in a bcrypt library as a dependency, and it has a getEntropy function that, despite a tangle of ifdefs, didn’t work on one of the Linux distros we target because of some glibc versionitis, so our Linux Guy had to cobble together a patch.)
Once you have that, writing a PRNG is, what, five or six lines of code? It’s a struct with a few bytes of state and a short math formula to perturb the state.
I don’t keep up with rustc internals, but would it be possible (and helpful) to:
allow writing proc macros using const fns and run them with MIRI to avoid the separate crate and codegen stage.
parse dependent crates first and then only compile the items from dependencies that are used, potentially skipping some transitive dependencies entirely.
What’s strikes me more that isn’t discussed in this post is that the ppv_lite86 crate is mostly a hacked-together version of 10% of std::simd‘s functionality, specialized for Intel (as the name suggests), and with little activity. Ultimately it will go away when the portable_simd feature is stabilized. I cannot tell when that will happen given how big of a beast it is, but in my experience it works well already so it’s just a matter of time. And I think it’s good that the stdlib is taking its time to make sure such things get done right rather than included too quickly.
What the post also missed is how complex some things have become within rand itself, which is the price to pay for performance. Take [T]::shuffle(): it’s implementation first defers to partial_shuffle and IncreasingUniform with already non-trivial functionality, which then calls random_range somewhere else. This is defined in the SampleRange trait which defers to the SampleUniform trait, itself defined for integers via a macro. Lastly, we arrive at the implementation of sampling an integer in a range begin..end, which seems obscure at first. There is an explanation comment, but it doesn’t seem to match the implementation and is indeed copy pasted from the previous version. The CHANGELOG leads to PR #1287 and PR #1272 that mention “Lemire’s trick”. Looking up “lemire uniform random range” on a search engine finally leads to the algorithm description (which is pretty neat).
All to say that this is inherently complex code, that definitely has its place in the rand crate regardless of dependencies: sampling a uniform integer in a range is fairly common, and asking users to convert uniform bytes into uniform integers in a range themselves is a recipe for disaster as the naive x % range sampling is biased (and division/modulo is slow).
Would have I done it more simply? Probably not: all these intermediate traits and macros serve their purpose. And if I had to choose between a naive algorithm and a slightly more complex optimized algorithm that performs 10x faster, for a foundational crate I’ll take the faster algorithm. Surely some things could be documented better, but we’re mostly volunteers here on open source, and rewriting the wheel doesn’t necessarily mean better documentation.
I am not sure they have implemented Lemire’s algorithm correctly: it looks like the rand code is calculating the rejection threshold first, which involves a slow % operation. Lemire’s algorithm avoids the % in most cases by using the limit argument as an over-estimate for the rejection threshold; if the first sample passes the rough check it can be returned directly. Otherwise the exact threshold is calculated with %, then the first sample is re-checked in the first iteration of the rejection sampling loop. For performance I like to inline the fast path (up to the rough check) and move the slow path (% onwards) to a separate function.
I came across this good comment by u/jswrenn on Reddit
I think you’re making a mountain out of a molehill.
My sense of this is not very calibrated, so can someone weigh in on this? 0.8 was released more than 4 years ago, and 0.9 was released last week.
What exactly would be perfect? No dependencies at all? I mean, let’s look at what it’s actually depending on there.
rand_chachaandrand_coreare part of the same project, so they shouldn’t count as extra dependencies.libcis pretty foundational.getrandomis also part of rust-random and I think it’s nice that they have it separated so that people who just want random numbers from the system can do so without needingrand.cfg-if, I guess this should be part of the standard library or something?ppv-lite86is an implementation of a cryptographic primitive, which seems fair to outsource.None of those seem like things you’d want to remove.
Ok, I guess that confirms that, you think the perfect version of
randwould be one that has its own implementation of ChaCha20 and which isn’t split into 3 crates. I really don’t see what difference that would make, except make the raw count of crates lower.I didn’t bother to look at the other crates, but this one stuck out to me as seeming unlikely:
So I went and looked at the code.
lib.rsis 1587 lines of tests, 1439 lines of comments, and 644 lines of code.io.rsis 1149 lines of comments and 385 lines of code. So 1029 lines of actual code that gets compiled. It’s highly repetitive, implementing the big endian and little endian read and write logic for a bunch of different types. More macros could reduce the line count, but that wouldn’t reduce the amount of code needing to be compiled nor the size of the machine code.Why would you put terminal size detection in the standard library? What? Mmmmmaybe getrandom should be included. The criteria right now is that something should be in the standard library if it has to be in the standard library, or if pretty much every program needs it. But actually doing the work takes time. Safe transmute might render many uses of zerocopy obsolete, but safe transmute isn’t done yet!
Also why would you care what people on Twitter say? Most people I’ve seen who engage in dependency discourse over there seem to be brofluencers who spend more time posting hot takes to social media than they actually spend coding.
From a long-term perspective, having random number generation not in the standard library seems like the right bet.
rand(3)is broken, both from an implementation quality perspective (on many, but not all, platforms), from a performance perspective (unnecessary synchronization for most applications), and from an API perspective (at least for multi-threaded and forking programs), and surprisingly weak guarantees. Those things may not have been visible to the original creators ofrand, though, because they’re mostly due to newer concerns (multi-core, crypto-related stuff we’ve learned over the decades since, etc).RNG is hard because of crypto, and simulation and other numeric stuff, and all kinds of other reasons that make it very hard to just have one good RNG that solves all problems well. It’s a deep enough problem that there’s even a ton of academic beef about the right way to do it.
Honestly, I think Rust has gotten this right.
Continuing from the previous discussion on library vs physical dependencies,
randis a good case study, as, I think, it mixes several distinct physical things under the same roof.crypto::random_bytes(buffer: &mut [u8]) -> ()function. That’sgetrandom(and should have been added to std like five years ago).fn new(seed: u64) -> Rng, it’s a set of pure, mostly non-generic functions. This is the biggest value-add, and there’s a dozen of smaller rand variations on crates.ioI am a person who goes out of their way to use minimal or zero dependencies, especially transitive dependencies, but I find it difficult to articulate why. It just feels bad, which isn’t very scientific. I’d honestly rather spend 10 minutes setting up
pico-argsinstead of 10 seconds setting upclapjust to avoid the larger dependency graph, but I don’t really know why. I wasn’t planning on auditing those crates no matter how many there were, and I’m only saving ~2 seconds of cold compile time. It just feels worse somehow.Rust compiles really fast when it’s just your own code, but then you add something to do a HTTP request and your dependency graph fills multiple terminals and takes much longer to build and link. It’s really not the end of the world, but still it makes me want to shell out to
curlinstead.Obtaining entropy, for RNG seeding or cryptographic use, is something that really should be in a standard library because it’s OS functionality that varies a lot by platform (even between Unix-derived ones.)
(We just had to deal with this in a C project on my team at work — we brought in a bcrypt library as a dependency, and it has a getEntropy function that, despite a tangle of ifdefs, didn’t work on one of the Linux distros we target because of some glibc versionitis, so our Linux Guy had to cobble together a patch.)
Once you have that, writing a PRNG is, what, five or six lines of code? It’s a struct with a few bytes of state and a short math formula to perturb the state.
recently
I don’t keep up with rustc internals, but would it be possible (and helpful) to:
allow writing proc macros using
const fns and run them with MIRI to avoid the separate crate and codegen stage.parse dependent crates first and then only compile the items from dependencies that are used, potentially skipping some transitive dependencies entirely.
What’s strikes me more that isn’t discussed in this post is that the
ppv_lite86crate is mostly a hacked-together version of 10% ofstd::simd‘s functionality, specialized for Intel (as the name suggests), and with little activity. Ultimately it will go away when theportable_simdfeature is stabilized. I cannot tell when that will happen given how big of a beast it is, but in my experience it works well already so it’s just a matter of time. And I think it’s good that the stdlib is taking its time to make sure such things get done right rather than included too quickly.What the post also missed is how complex some things have become within
randitself, which is the price to pay for performance. Take[T]::shuffle(): it’s implementation first defers topartial_shuffleandIncreasingUniformwith already non-trivial functionality, which then callsrandom_rangesomewhere else. This is defined in theSampleRangetrait which defers to theSampleUniformtrait, itself defined for integers via a macro. Lastly, we arrive at the implementation of sampling an integer in a rangebegin..end, which seems obscure at first. There is an explanation comment, but it doesn’t seem to match the implementation and is indeed copy pasted from the previous version. The CHANGELOG leads to PR #1287 and PR #1272 that mention “Lemire’s trick”. Looking up “lemire uniform random range” on a search engine finally leads to the algorithm description (which is pretty neat).All to say that this is inherently complex code, that definitely has its place in the
randcrate regardless of dependencies: sampling a uniform integer in a range is fairly common, and asking users to convert uniform bytes into uniform integers in a range themselves is a recipe for disaster as the naivex % rangesampling is biased (and division/modulo is slow).Would have I done it more simply? Probably not: all these intermediate traits and macros serve their purpose. And if I had to choose between a naive algorithm and a slightly more complex optimized algorithm that performs 10x faster, for a foundational crate I’ll take the faster algorithm. Surely some things could be documented better, but we’re mostly volunteers here on open source, and rewriting the wheel doesn’t necessarily mean better documentation.
A Fisher-Yates shuffle should be 4 lines of code.
I am not sure they have implemented Lemire’s algorithm correctly: it looks like the rand code is calculating the rejection threshold first, which involves a slow % operation. Lemire’s algorithm avoids the % in most cases by using the limit argument as an over-estimate for the rejection threshold; if the first sample passes the rough check it can be returned directly. Otherwise the exact threshold is calculated with %, then the first sample is re-checked in the first iteration of the rejection sampling loop. For performance I like to inline the fast path (up to the rough check) and move the slow path (% onwards) to a separate function.