I’d like to make a small defense of Rust by cherry picking a few things from the OP.
Templates are objectively extremely difficult to understand and use correctly, but they also make it a lot easier for the compiler to optimize the snot out of your code compared to other generic programming systems.
Rust generics aren’t quite as expressive as C++ templates (a notable missing but much desired feature is type level integers), but they lack the C++ drawback of having bad failure modes but have the same advantage of generating well optimized code through monomorphization. Both C++ and Rust suffer from a hit to compile times though.
AFAIK, the plan for adding custom allocator support to Rust is roughly similar to how it’s done in C++, by adding an additional type parameter with a default type for ergonomics in the common case.
I really like writing efficient code. It’s what gets me excited. Some people are really into testing, or refactoring, or applying “design patterns”, or any number of other things. But for me—I like writing fast, efficient code.
I almost feel the same way. But my passion for writing fast code is matched by testing and refactoring, all of which are things that Rust does almost as well as any other language out there. (Permit me to be a bit fuzzy, I’m expressing some opinions informed by experience.) It is really satisfying to have fast code, but god damn, it’s satisfying to know it’s correct too. And when you realize you’ve messed up your design, being able to do that refactor with a strong type system is just pure bliss.
But if you’re willing to engage in some masochism and want to write code that’s really, really fast, C++ is hard to beat. And I don’t see that changing any time soon.
Leave off the masochism part, and I think Rust has a fair shot. Sure, there’s the infamous borrow checker learning curve, but by most accounts that appears to be a one time cost. (Which is not to be dismissive. We have a uphill battle in lowering that particular barrier to entry.)
I actually have some reasonably compelling data to support this. Consider two non-trivial performance critical C++ libraries: RE2 and snappy. Rust has roughly equivalent libraries, regex and rust-snappy and both achieve comparable performance.
The regex benchmarks are a bit overwhelming and many of them are misleading because of additional optimizations that the regex library performs, which means those aren’t good benchmarks to use as evidence in a language comparison because RE2 could feasibly do them too. But some benchmarks, like before_after_holmes roughly measure the throughput of an infrequently matching regex in the core DFA regex engine itself. Performance is within the same order of magnitude, and this wasn’t easy to achieve. It required carefully eliminating a single pointer chase and it was also the one place in Rust’s regex library that required non-trivial use of unsafe to elide bounds checks. But the entire rest of the regex library is written in safe code, and that’s really the only reason why fuzzing has (so far, fingers crossed) only revealed logic bugs instead of memory corruption bugs. Both regex engines roughly implement the same algorithms.
The snappy benchmarks also tell a story of similar performance results. Making a fast snappy library is a task in avoiding bounds checks and making use of unaligned loads and stores, and this happens a lot. It’s difficult to encapsulate this into one place, so my snappy implementation contains quite a bit of unsafe. However, if I’ve done my job correctly, then there is no use of the snappy API that will lead to memory unsafety. These same types of optimizations are done in the C++ library. Interestingly, there is a Go port of snappy as well, but the authors had to drop down to Assembly to get comparable performance. The Rust and C++ snappy implementations should yield the same outputs.
These are only two data points, and therefore, it’s obviously incomplete. But I feel these are pretty compelling as a good faith existence proof that Rust has the potential to be every bit as fast as C++. There are places where Rust is lacking. @peter noted a few things already, but there’s more. We’re missing explicit SIMD on stable Rust, for example, but we hope to making a proposal for that soon.
There are places where Rust is lacking. @peter noted a few things already, but there’s more.
I’m happy to add that Rust is quickly getting those important features, and has come a long way in a short time. When it does meet feature parity with C++ I expect we will see a lot more commercial products written in Rust. The safe abstractions are too nice to pass up, as long as the new abstractions continue to follow the same dedication to zero cost that Rust has had so far.
I dislike writing C++ as much as the next girl, but I can’t deny that it’s capable of some incredible things. Look at this guy using fancy C++17 features to generate really tight 6502 code.
yeah, I’ve talked to a lot of hardcore C++ programmers who talk about how the improvements since C++11 really let you get the safety checks and performance things you want.
It’s hard to write, of course! To write effective libraries you need to have the entire language spec in your head. But you can get pretty simple code at the end (similar syntax to the other languages for your actual business logic).
I think writing C++ is a lot like playing Magic The Gathering. There’s a lot of moving parts, it’s kind of hard to grok the details. But it’s really fun for a certain subset of people. A puzzle box to be tricked into doing what you want.
I agree completely. Before working on a database engine, I didn’t really use C++. But now I love it. When the language actually enables you to do wild optimizations you couldn’t do any other way, that’s a great feeling.
Maybe languages like Rust will dominate C++ one day, but right now Rust is missing way too many features to write something like a database engine.
Custom allocators, signals, epoll, aio, other exotic OS interfaces that can’t currently be used overhead-free. Those area few big ones I can think of off the top of my head, but I certainly don’t know all the caveats. That is a caveat in itself, we can’t assume Rust will work, and then 2 years into development discover a major competitive advantage is impossible in Rust. One way or another, everything is possible in C/C++, even if it’s not necessarily fun or easy.
Allocators: you can swap out the primary allocator, but that’s not even close to good enough. Databases have quite a few types of allocators. One example is an allocator that allocs onto the stack up to a certain point and spills over to the heap once that stack space has been used. Useful for when the common case fits in a small space. You also need to (well, should) account memory used by different subsystems. In C++ this is easy with placement new onto an allocator. This is unbelievably important.
Signals: there are Rust libraries to handle them, but Rust makes no guarantees that it won’t call banned functions during a signal handler. Don’t deadlock the database please.
Epoll: there are libraries, they are mixed levels of quality, and they often demand you break existing abstractions in Rust or use lower performing ones. That’s a silly compromise compared to C++, i.e. use the language the interface is written for. This is a common theme.
AIO: more ffi to C, same problems.
And so on and so forth. Honestly I’m not convinced this will go away until there is a mature and stable kernel written in Rust. The systems world is just designed for C right now. But maybe I’m wrong, LuaJIT for example has achieved basically native perf for its ffi.
One way or another, everything is possible in C/C++, even if it’s not necessarily fun or easy.
Is that true? IIRC standard C/C++ provides no way to even access the carry flag, for example. You can write C/C++ and link against functions written in assembly, but you can do that in Rust too.
Signals: there are Rust libraries to handle them, but Rust makes no guarantees that it won’t call banned functions during a signal handler.
You don’t have that guarantee in C/C++ either do you? You just have to manually check that it doesn’t happen.
When writing a database, no one uses “standard” C++, everyone uses all the features they can of their compiler. This is mentioned in the original post.
And yes, you do have that guarantee in C/C++. POSIX has a whitelist of C functions you can use in signals. In C, you know what C functions you are calling, because you are writing C. In Rust, the standard library calls all sorts of C functions, and you can’t know which ones for sure. Rust developers have noted this issue, I’m not just making up the problem.
Allocator support is being worked on, but that seems more like a convenience to me. Similarly, all of the rest of the stuff you mentioned can be done with FFI, which seems pretty reasonable to me if you’re building a database. Rust’s standard library makes extensive use of libc via ffi, for example, so I don’t see why a database couldn’t do the same. In Rust, you at least have the option of building a safe API around those system interfaces, which is large part of what std does.
There are some higher level abstractions evolving for async IO, but it’d be premature to say with confidence that they will fit your use case. I expect we’ll know soon though.
Allocator support is being worked on, but that seems more like a convenience to me.
Your database can’t just crash on out of memory, and some memory requests MUST be fulfilled while others can safely fail. For example, failing a query with out of memory error is fine, compared to failing to replay replication logs. It would be preferable to drop some queries instead of stalling all replication and blocking any queries from finishing, which just makes the memory problem even worse.
Furthermore, debugging query execution is HUGE in a database. How much memory did a query use? Which query in the workload is using way too much? Etc, etc.
Exactly. And that doesn’t make Rust bad, just not ready yet.
Your database can’t just crash on out of memory
But that’s not what Rust does. Rust’s standard library will panic on OOM, and panics can be recovered from.
There’s nothing stopping anyone from building their own Vec (or whatever) that doesn’t have panic-on-OOM semantics. It’s not an easy task, but it seems well within scope if you’re building a production grade database.
Why? Which high level async IO C libraries are you using in your database? Or are you using epoll/aio directly? If directly, then you can do the exact same with Rust.
Yes you can make a Vec with OOM semantics, but that doesn’t address any of the other issues I pointed out about memory. Rust will certainly have custom allocators in the future, but it doesn’t right now.
As for epoll/aio, you’re right they do get used directly. But if I have to write the whole storage engine in unsafe ffi code, why even switch? At that point you’re just writing C anyway. Elsewhere you discussed building a safe interface around unsafe code. Unsafe code necessary for performance. It’s the same thing in a database. As Rust continues to grow and develop zero-cost abstractions for that sort of thing, it will become more and more compelling. I trust that it will. And maybe the next commercial database engine I work on will be written in Rust, but right now that isn’t going to happen.
Correction: this is wrong. Rust’s standard library currently aborts on OOM, and therefore cannot be recovered from.
The other pieces are still right. One interesting path while still using std is writing a custom allocator that does panic on OOM instead of abort.
In case of epoll/AIO there are: futures and mio and tokio that brings them together. It is all still new, but Rust itself is also new.
I believe that Rust’s Vec can realloc while C++’s vector can’t. I would think that it is something that is nice to have inside of a database engine, but I don’t really know much about them.
AFAIR one of Rust’s goals is to have great C FFI so that code in Rust can start making a beachhead.
I believe that Rust’s Vec can realloc while C++’s vector can’t.
std::vector does realloc, and if it didn’t then it would be easy to write one that does.
And it’s actually not totally wonderful in a database engine. You want to know how much memory you’re using up front as much as possible, and if you can’t then you want your memory to grow very predictably. But it varies case to case, some places it matters less, some places it matters more.
Could you tell me which STL implementation does realloc? AFAIK if it could be done it would have to be outside allocator framework as it provides only allocation and deallocation.
Possibly I’m being obtuse because I haven’t tried this, but rather than look for a library couldn’t you just call epoll directly? There’s no runtime overhead to that as far as I’m aware (but I could easily be wrong) – you can link directly to C libs.
I guess I see this as part of a broader trend that I find rather surprising: people complaining that the entire world’s APIs haven’t been rebuilt in Rust and resisting free calls out to C, when I found the ability to directly call C one of Rust’s most compelling points.
You can call out to epoll directly, and then call out to aio directly, and use unsafe to dodge bound checks on our buffers and… wait why are we writing Rust again?
As time passes, the Rust team is decreasing the need for unsafe by creating more safe zero-cost abstractions. At some point they will have enough of them that you’ll actually be able to write a database engine in Rust, and not unsafe, ffi-laden pseudo-Rust. I have no problem with using the ffi, but at some point you’re basically writing C. For a Rust library, that’s totally fine, the whole point is to create a Rust library. But that’s not the objective of a commercial database engine.
You can call out to epoll directly, and then call out to aio directly … wait why are we writing Rust again?
Because 99% of my code is application logic, not OS glue calls (perhaps a DB may vary).
I have no problem with using the ffi, but at some point you’re basically writing C.
This is where I disagree. I want ffi, because I’m writing a native application, so I want to be able to deal directly with the OS’s features – in my case, I want to be able to talk to kqueue, etc. I don’t want some Java-style “close my eyes and pretend the OS doesn’t exist” lowest common denominator subset of features, because that’s rarely a sophisticated-enough set of constructs to really take advantage of the platform. Rust can’t meet its goal of a standard library that runs on Windows, Mac, Linux, etc, and provided a Rust-native solution that’s nearly as sophisticated as raw kqueue, or raw epoll, or raw io completion ports. No cross-platform abstraction can.
But that’s a far cry from “basically writing C”, because most of my code isn’t ffi calls. I want to be able to call the OS’s interfaces, but that they decided to write them in C is uninteresting to me and shouldn’t guide my decision. Rust gives me the best of both worlds – I deal directly with interesting OS features, but my code is safe, even if the OS isn’t.
Yes, a DB does vary. I would say the majority of code falls into 2 categories:
What exactly is it that you think a database does? In a database, the “application code” that could be written in Rust is the glue code, between the OS and the ultra high performance code. A database is the spiritual equivalent of a safe abstraction around a ton of unsafe code. And it would be nice if that glue code could be written safely in Rust, but again since the rest of the database is basically C/C++ why bother?
So yeah, 99% of YOUR application might be application logic, but my point has always been that Rust isn’t really ready to write a real commercial database engine. C++ is still better for that, and other similar applications.
For example, if I were to write a kernel, I’d still pick C++ over Rust, until Rust gets better manual memory management. Or projects like Apache Spark, which is trash because it’s written in Java. You wouldn’t gain anything with a rewrite in Rust because 99% of the problem with Spark is you can’t tell how much memory you’re using, so you might just crash because life sucks for you and your single global memory allocator.
but my code is safe, even if the OS isn’t.
If the OS isn’t safe your code isn’t safe. That’s nonsense.
For example, if I were to write a kernel, I’d still pick C++ over Rust, until Rust gets better manual memory management.
Have people written kernels that use C++’s standard library?
Yes. OSv kernel is written in C++ and uses STL in kernel space.
Ask Microsoft. But that’s a false equivalency anyway. Manual memory management != allocator type parameter in the standard library, although of course that helps. C++ has placement new, new and delete overrides, and the power to manually control and modify the normal constructor / destructor lifecycle.
I’m not making an equivalency. I’m asking a question. If there’s a kernel that uses C++’s standard library, then that seems like an opportunity to learn something.
I get the point of his article, but it isn’t the friendliest of defenses. It boils down to, if you like fast, optimized code, use C++. Otherwise, it’s confusing and painful.
Summary: “Writing fast code is fun, and while micro-optimizing memory copies and minimizing pointer indirection isn’t for everyone, it can be enjoyable if you have a knack for it.”
I’m not saying performance and efficiency aren’t important, but a lot of times they aren’t. IME a lot of people use C++ as a crutch to make their inefficient algorithms/code run fast enough.
I prefer to code in a language that isn’t so byzantine, and worry about performance and optimization if it actually becomes a problem.