1. 41

  2. 12

    For the C version, this is a binary search tree

    Rust (interestingly) doesn’t offer a binary search tree — and it is instead implemented with a BTreeSet

    So in the end, this compares a particular C implementation of an algorithm and a particular Rust implementation of a different algorithm ?

    1. 22

      I think the implicit principle you’re appealing to has consequences you might not appreciate. You can write unsafe code in Java, you can use object pools, you can roll your own string class, you can avoid ever triggering garbage collection, you can write custom bytecode. You can solve a problem “in Haskell” by writing Haskell code that compiles a custom DSL to machine code. All these things are sometimes reasonable, and it is important that you know about them in order to understand the options that you have when trying to meet some performance target.

      However, performance is almost always an economic consideration, rather than a question of what is theoretically possible. A language may be better suited to achieving certain performance because it is easier to do in that language, even though it is possible in another.

      Having put it in my words, let me quote extensively from Brian’s post, because I can’t help but think that he already answered you.

      To the contrary, I think that it is a reasonable assumption that, for any task, a lower-level language can always be made to outperform a higher-level one.

      Indeed, it might be tempting to conclude that, because a significant fraction of the delta here is the difference in data structures (i.e., BST vs. B-tree), the difference in language (i.e., C vs. Rust) doesn’t matter at all.

      But that would be overlooking something important: part of the reason that using a BST (and in particular, an AVL tree) was easy for me is because we have an AVL tree implementation built as an intrusive data structure. This is a pattern we use a bunch in C….Implementing a B-tree this way, however, would be a mess. The value of a B-tree is in the contiguity of nodes — that is, it is the allocation that is a core part of the win of the data structure. I’m sure it isn’t impossible to implement an intrusive B-tree in C, but it would require so much more caller cooperation (and therefore a more complicated and more error-prone interface) that I do imagine that it would have you questioning life choices quite a bit along the way.

      Contrast this to Rust: intrusive data structures are possible in Rust, but they are essentially an anti-pattern….All of this adds up to the existential win of Rust: powerful abstractions without sacrificing performance. Does this mean that Rust will always outperform C? No, of course not. But it does mean that you shouldn’t be surprised when it does — and that if you care about performance and you are implementing new software, it is probably past time to give Rust a very serious look!

      1. 3

        I’m missing a step in his argument. He goes from “it is convenient to use intrusive data structures to implement AVL trees in C” (paraphrase) to “it would not be convenient to implement a B tree using intrusive data structures in C” to “Rust is better because it discourages use of intrusive data structures”. But then, assuming he is correct, why do we have a requirement that the C B tree use intrusive data structures?

        1. 2

          I think he’s implying that C leads itself to intrusive data structures in these cases, but I don’t know enough to have my own opinion about whether that’s the case.

          My point is really just that I don’t think he’s making an obvious mistake about how to compare languages, like the OP suggested.

          1. 0

            Well, he writes:

            I’m sure it isn’t impossible to implement an intrusive B-tree in C, but it would require so much more caller cooperation …

            I think he had a C AVL tree and a Rust B tree and then came up with an excuse why comparing the two was reasonable.

        2. 2

          Thank you for the clarifications. It looks like there are different ways to read the same title. The actual content is interesting and not as ambiguous.

          1. 3

            I think you’re right that the title suggests a comparison that’s not really in the post.

            I don’t know what a better one would be. We could go 19th century: “Some considerations on the relative performance of C and Rust programs when written naturally, and how the language might contribute to those differences”, but it’s a mouthful.

            1. 2

              “Two programs with differing performance”

        3. 1

          I’m with you. Comparisons of languages talking about performance should be done with same algorithms where possible. Otherwise, it’s harder to say what differences are due to algorithm differences instead of language. Essentially, we want to eliminate variables but author introduced them. The explanation I saw in a quick skim is this:

          “For the C version, this is a binary search tree (an AVL tree), but Rust (interestingly) doesn’t offer a binary search tree — and it is instead implemented with a BTreeSet, which implements a B-tree.”

          Author used a B-tree, not AVL tree, for Rust since Rust didn’t have an AVL tree. If comparing apples to apples, the more obvious route is to use one of the many B-tree implementations in C to compare against Rust’s B-tree. One might also look at structure to see if they’re similar in implementation to further eliminate variable of algorithmic differences. That would be my performance comparison. Then, I’d look at developer effort of implementation or its maintenance (esp readability) possibly comparing some variations in C and Rust along that spectrum, assessing effect on performance.

          “I reimplemented a body of C software in Rust, and it performed better for the same task; what’s going on? And is there anything broader we can say about these results?” (my emphasis)

          I wouldn’t believe anything broader based on this single, case study. All it tells me is that a program (a) ran fast at specific task in Rust and (b) it was faster than a C program with different algorithm using a specific, compiler configuration. That’s literally all it proves. This case study is useful if you’re doing something very similar, know Rust’s benefits, and want to assess if your app could perform well if written in Rust.

          On good note, it also has some tool mentions and deep analysis that people might enjoy reading and/or find useful. Those seem to be practical benefits of reading this article.

          1. -2

            Ugh. I got 1/4 into the post to this point and shouted outloud, “well, you can’t compare that!”

            title should be, “the relative performance of avltree and btree for a particular application”

            1. 20

              The sad thing is that this is an interesting blog post, yet almost all the discussion across several sites is over two words in the damn title.

              Don’t discuss the data structures themselves, the cache effects, the large difference in split loads/stores, the use of DTrace, the demonstration of active benchmarking, or anything remotely technical. :(

              Blog post bike-shedding.

              1. 19

                Of course you can; in fact, I much prefer those kind of comparisons, because they are more telling!

                It is way more interesting to know how well a program performs when you write it as it’s meant to be written in the implementation language, rather than twisting and deforming it, because you want to replicate exactly what you did in another language (which Bryan tried to do if you read his previous post). Rust provides generic data structures out of the box; the right way to write programs in Rust is to use those data structures, not to tangle yourself in a mess of intrusive pointers. Bryan wrote a program that is (a) idiomatic Rust, (b) faster than idiomatic C, and that’s very interesting to me. If you’re still pissed that the implementations are different, his code is open source: go implement the Hashmap and/or the BTree in C.

                1. [Comment removed by author]

                  1. 9

                    Why not? He’s comparing the time it takes for two programs to perform the same task. That’s performance evaluation.

            2. 1

              Other people have noted on the disparity of data structures across the comparisons.

              I’d like to point out a deeper flaw. Rust versus C almost always use GCC as the benchmark. Yes, GCC is for many people the ‘canonical’ C compiler, but, because Rust uses LLVM, what you’re basically testing is GCC’s optimizations versus LLVM’s optimizations. To even the playing field and reduce disparity it almost always would be more appropriate to use Clang, and therefore you confine the Rust/C performance differences to actual features and benefits/flaws of the language, rather than features of two different code generators.

              1. 2

                he updated the article with a Clang comparison.