1. 40
  1. 5

    I’m curious if using mimalloc would offer any sort of performance improvements too.

    Aside from the links you posted about performance measurements, which tools do you use to find hot spots? Thanks for sharing this post!

    1. 1

      I can’t recall if we benchmarked mimalloc or not. It would probably be worth the experiment.

      The main performance tool we used was Linux’s perf: https://perf.wiki.kernel.org/index.php/Main_Page , along with using our own metrics infrastructure to watch the relative performance of passes over time.

    2. 4

      Wow this is fantastic! I started reading through the code and it’s very nice. 69K lines of C++ by my count.

      I added it to my wiki page since it’s using 32-bit IDs rather than 64-bit pointers.


      I have been wondering if that style is feasible “by hand” or it’s better to generate source code, so Sorbet is a good data point in favor of the former. Although maybe some of the “subtle invariants” mentioned at the end relate to that.

      If anyone knows of any other codebases with this style for large trees or graphs feel free to add them to the page!

      1. 4

        Sorbet looks like an interesting project, but after looking a bit at the homepage, I’m still a little unsure if it really is a good fit for ruby - maybe I’m just missing some of the goals? I see there are various tricks and tips for complex handling, but given the example from the front page:

        # typed: true
        extend T::Sig
        sig {params(name: String).returns(Integer)}
          def main(name)
          puts "Hello, #{name}!"
        main("Sorbet") # ok!
        main()   # error: Not enough arguments provided
        man("")  # error: Method `man` does not exist

        These are all well and good, but if we look at what the method does, it takes name, sends the message/calls the method #to_s on it (implicit in the format string), and sends #length.

        But if we try with:


        Sorbet complains, but the code runs fine. Now I get that we’ve said that we only take in String (and not array) - but I wasn’t able to figure out how I could easily specify a signature of “anything that #responds_to #to_s returning a String, and also #responds_to #lenght with an Integer.

        Is that sensible in Sorbet? Because I’d hate to lose all flexibility afforded by ruby, while I’d still like to avoid some common errors.

        (in this case, we probably do want name to be a String, unless we have something like a User class that implements #to_s as “#{given_name} {last_name}” - but then we’d probably want to pass that in, because what is the length of a User?).

        But contrived example aside - we do sometimes use such convenient conversions in ruby, and it would be interesting if Sorbet could help?

        1. 5

          Sorbet deliberately does not support structural typing – e.g. things like “anything that responds to #to_s and #length”. The primary reason we made this choice was that we wanted to encourage explicitness within Stripe’s multi-million-line codebase, and believed based on our experience that our codebase wasn’t widely using polymorphic duck-typing anyways. Since we had a monorepo, we could also always convert duck typing into explicit interfaces by introducing an interface and includeing it into every relevant concrete type.

          This was definitely one of our more common questions we get from users. To some extent I think there’s a philosophical question here about what “rubyish” code should look that I can’t answer, but I can say that this choice hasn’t been a problem in practice at Stripe, or a substantial pain point for the other companies who are adopting Sorbet.

          1. 1

            Right, I absolutely see the trade-off. How would one wrap the example in something sorbet understands? Am I reading the note about singleton methods right - there’s no way to define something that has a length or to_s method?


        2. 3

          Wow, this is a wealth of information! Thank you so much for not only doing the work, but also documenting it, so that others can learn from you.

          Do you have a rough rundown of which phases take which amount of time? I would naively expect that parsing and desugaring would be comparatively costly, but the post suspiciously says nothing about those. Does it mean that the actual type inference is where the bulk work is spent?

          1. 4

            The pipeline, at a high level, is divided into three phases – “index”, “resolve”, and “typecheck”. My recollection (I’m no longer at Stripe and so don’t have our metrics handy) is that in a cold start, “index” (parsing+desugaring) took about as much time as “typecheck”, and the central, serial, “resolve” component was 10% or less of the total.

            However – and I didn’t mention this in the post – we cached the result of the “index” phase on a per-file basis, which meant that in practice parse+desugar performance was not on the critical path for cached runs. I also didn’t say too much about it because I don’t recall doing anything particularly clever to speed up those phases, beyond the things mentioned in the article – e.g. the compact core data structures and the metrics-driven ordering of pattern matching.

            1. 1

              Aha, that makes sense, thanks! Doing per-file incremental indexing and then per-function typecheckig is a really great architecture, if the language allows for per-file analysis. This is roughly how IntelliJ is setup as well. Sadly, that approach does not work for Rust.

            2. 3

              In the article it is confirmed that type inference is indeed where the time is spent:

              This fact allowed us to parallelize the actual inference half of the typechecker (which takes almost half of our time, and more in incremental runs) almost trivially.

              1. 1

                FWIW I also posted this other talk on Sorbet which quotes a few numbers:


                It’s not exactly what you want, but it’s a good perspective as well.

              2. 1

                @nelhage If you’re still reading, I wonder if you thought about Sorbet’s architecture as a MapReduce? That seems like a relevant term for “why it’s fast”.

                I found the code very clean and easy to read, in particular realmain.cc. It even has this “pull shards off a queue / merge / sort” step, just like MapReduce.


                That file makes it very clear that it’s a “data-oriented” design, which enables data parallelism.

                There were a bunch of papers about multicore (not distributed) MapReduce 10 years ago like

                Evaluating mapreduce for multi-core and multiprocessor systems

                I was prompted to think of this nice “hourglass” shape by a recent thread:



                1. 2

                  Hm. I am quite familiar with MapReduce, and while I can definitely see the analogy, I don’t think it’s more than an analogy. The core insight of MapReduce, IMO, is that you can implement large classes of distributed algorithms as combinations of parallel map, a parallel reduce, and a shuffle-sort step. Importantly, the shuffle-sort step (the only primitive in that list not to be trivially scalable) is completely generic, so it can be implemented once by the framework, and then we can provide interfaces to our users for writing pure map+reduce functions.

                  The equivalent in the Sorbet pipeline is the “resolve” step, which is heavily Ruby and Sorbet-specific, and contains some of the trickiest code in the pipeline; it’s not something that would fit into a typical mapreduce framework. The sort you link is in fact essentially cosmetic; It’s there to make sorbet more deterministic for ease of testing and debugging, but in principle should be unnecessary to correctness.

                  That said, I think your broader point about “hourglass” architectures, which separate horizontally scalable steps with “merge” nodes that wait on all previous steps and perform some global interaction, is on to something – I think that is a common pattern in distributed and parallel execution, and might benefit from language or at least toolkit support.

                  1. 1

                    Thanks for the reply – after thinking about it more I also realized it’s not literally a MapReduce. As far as I can tell there’s no “key” output from the parallel map, and like you say the shuffle isn’t generic (or distributed).

                    But it does feel to me like the code is “contorted” (in a good sense) around parallelism… in the same flavor as MapReduce. It appears to me you made some nontrivial early design decisions and stuck with them throughout.

                    I think of MapReduce as a bit of a “game” to take normal algorithms and fit them within those constraints. That often ends up contorting things, and sometimes you have to cheat with some out-of-band communication too. That’s probably one of the reasons more general/flexible models of distributed processing have become more popular than MapReduce.

                    And as other people mentioned, many other compiler algorithms probably don’t fit into this model, and whether it’s applicable heavily depends on language design / type system design, but trying it is an interesting thought experiment.

                    Anyway thanks for writing up these notes about Sorbet. There’s a lot of good stuff there and I enjoyed looking through the code!