1. 22
  1. 5

    Minor nit, but:

    Similarly, many projects have forgone virtual methods (RTTI in C++ parlance)

    That’s not what RTTI is. RTTI is the type_info structures that the compiler emits for every class. These have a bad reputation because the language spec requires it to have a name() method that returns an implementation-defined unique name for the type. This must be returned as a C string, so you can’t easily merge substrings. The only convenient unique name for a type is normally the mangled name, which is long. As a result, enabling RTTI often leads to a 20% binary size increase, sometimes more.

    In exchange for this, you get two language features:

    • dynamic_cast
    • Exceptions.

    Exceptions don’t actually need full RTTI support, they just need any type used in a throw or catch expression to have RTTI, which is a lot smaller. Full RTTI is used only for dynamic_cast. This, in turn, has terrible reputation because it’s completely generic. It allows casting from one branch to another in a diamond-inheritance graph, even when you don’t know the type of the source pointer. You can dynamic_cast from a void* that contains an adjusted pointer for one of your diamond-inheritance parents to an adjusted pointer for another one. Because this is generic, it’s also incredibly slow. If you just want to be able to safely down-cast, you can get much better performance by doing this as a single virtual call.

    Aside from that, it’s a great article. The punchline is Paolo Bonzini’s optimisation philosophy: a 1% speedup becomes a 10% speedup after you’ve got rid of the other bottleneck.

    1. 2

      Many of us have wasted a ton of time implementing userspace multitasking and asynchronous I/O, because surely those system threads must suck, only to find that the kernel knows a hell of a lot more about CPUs and I/O than we do. Our case just wasn’t special enough.

      If only kernels exposed more of what they knew about CPU/IO to userspace in an efficient manner.. It’s always great to see more of this though (Windows KUSER_SHARED_DATA, darwin COMMPAGEs, io_uring TASKRUN/POLL_FIRST/etc.)

      1. 2

        Nice article. BTW, on MacOS it’s very easy to do call-stack-based profiling at the drop of a hat. Either type sample processname 2 in a terminal, or select the process in Activity Monitor and use the Sample command. (The latter turns the output into an outline view, making it easier to explore.) This is handy when a program gets super slow or hangs.