1. 19
  1. 18

    tl;dr: the compiler is helpless to optimize bad memory access patterns and data structure layout, which are a large part of the slowdown.

    My response would be: “Compilers really are magic black boxes that make your code fast. You’re underestimating the ability of programming languages to waste time on things that are not memory access patterns or data layout.” To make code really fast, you have to optimize your data storage at a low level. But the optimizer can make the difference between “fast” and “slow”, where “fast” is “bad cache characteristics” but “slow” is “bad cache characteristics, plus 100 deep forests of virtual method calls, plus function-local temporary allocations ~everywhere.” “Compilers make slow code fast” is a correct observation. “Bad cache patterns can really hurt you” is also correct. It’s just that lots of code starts out really, really slow.

    1. 11

      This article has a point, but it’s really not about compilers at all, it’s about data structures, and it abruptly stops after just starting its case.

      A much more useful article on the topic is “Handles Are The Better Pointers”, which has been posted here a few times.

      May I also note that the inefficiencies discussed here don’t effin’ matter for 99% of software, because the CPU and even RAM are plenty fast enough for most stuff that isn’t on the critical path. Guess what, the most-used operating system platforms (Android, ChromeOS, Windows, iOS, MacOS) are all heavily based on OO programming. Yeah, even ChromeOS — Chrome is C++. I’m sure the innards of CSS layout or Quartz compositing are heavily optimized in cache-savvy ways, but such areas are the 5% of the system that takes 95% of the time.

      1. 3

        the inefficiencies discussed here don’t effin’ matter for 99% of software, because the CPU and even RAM are plenty fast enough for most stuff that isn’t on the critical path.

        This. Mike Acton’s main point is that you should measure and choose the appropriate data structures and data layout in the places where it matters, not that you should cargo cult data-oriented programming and structs-of-arrays.

        Maybe SOA will improve the performance of your program significantly, but maybe it won’t because that piece of code is not the hot path, or because the access patterns don’t get a significant benefit from being a SOA, or maybe you’re using a language where this doesn’t apply because of how it stores data in memory.

      2. 10

        It is kind of ironic that LLVM was written to do these kinds of data layout optimization (Chris Lattner’s PhD thesis is titled Macroscopic Data Structure Analysis and Optimization), but everyone took LLVM and gave up on data layout optimization.

        1. 5

          This is actually an instance in which Java’s obsession with getters and setters would be useful; you could have a class with a similar interface to an array, but specialized for a specific data structure, which returns another class that just contains an index and a reference to the SOA container, and the getters/setters would do the actual work of picking the right item from the right array in the SOA class.

          class SoaAntColony {
              public int size = 0;
              public String[] names = new String[100];
              public Ant get(int i) { return new Ant(i, this); }
          class Ant {
              public int index;
              public SoaAntColony colony;
              public getName() { return colony.names[index]; }

          So the code would look pretty much the same.

          long numOfChosenAnts = 0;
          for (Ant ant : antColony) {
              if (ant.getAge() > 1 && "red".equals(ant.getColor())) {

          In languages with macros you could generate all of this boilerplate.

          What the author wants is indirection between how the data is stored in memory, and what it looks like in the code. I wonder what a language with good, fast suppport for this would look like, but I also fear that it would start reminding one of the interface hell of enterprise Java.

          1. 7

            In Java your Ant object is still heap-based, so getting its index field has to indirect to a random address, negating the optimization.

            In nearly any other language (OK, in C++, C#, Go, Zig, Rust, Nim, …) you could make Ant a trivial value wrapper for the index and just pass it around in registers or the stack. Apparently Java will finally be getting this ability in a year or two.

          2. 3

            There are sort of two critiques here–one is whether to arrange data in a columnar (e.g. all ant types stored contiguously, all ant sizes stored contiguously) or row (for each ant, all properties stored together) oriented format. Another is just that pointer-chasing is expensive, and if you aren’t going to reference data multiple times, you don’t necessarily need that pointer indirection.

            On the latter problem (and perhaps to some extent, the first) readers might be interested in something like Zach Tellman’s Vertigo, which offers nestable, packed structs and arrays backed by flat JVM ByteBuffers. Rather than chasing multiple pointers, you can jump directly to any field at any offset immediately. Vertigo also shows some possibilities for friendly iteration syntax, lazy views over structs, etc.

            1. 1

              The post nicely demonstrates the impact the memory layout can have on an application’s execution time. But it appears one-sited to me, as the drawbacks of the data oriented approach are not fully discussed. Hence I am not convinced that data oriented programming is a panacea, but I am certain that it is sensible in some sperformance-crticiall situations. I doubt however, that it should be applied in general.

              There a cases where the data oriented approach performs worse than the object oriented one. Think, for example, what happens if you want to delete some ants. Or if you read/modify a larger fraction of the ants’ fields, then the spatial locality of the fields becomes actually an advantage.

              1. 3

                In general, this is the issue of row vs column stores.

              2. 1

                This sounds like it could be implemented by a rust macro in a fairly transparent way, right? Return a colony ref + index with implementation which can forward everything to the column store. Similar to Java but auto-generated.

                I wonder if C++ templates are good enough to do that without a preprocessor too - it seems really close to what an iterator does.

                The question though is, would any such solution inline the accessor enough to get rid of the proxy object construction completely…

                1. 1

                  Yes this method can speed up some programs significantly. Probably not yours though, unless you’re writing a game or a simulation.