1. 26

  2. 5

    This takes a while to get to the point, but TL;DR prefer cache-oblivious algorithms. Erik Demaine provides a brief introduction to cache-oblivious algorithms in the OCW for Introduction to Algorithms.

    1. 3

      I don’t think that’s the conclusion at all. Rather, one needs to factor caching into a runtime analysis. One should be aware of the cache, not oblivious.

      1. 7

        You are both technically correct, imo.

        Cache-oblivious tends to refer to a data structure’s ability to continually decompose into smaller “units” which remain cache friendly. This can span the whole spectrum: from VM kernel pages down to L1. The article makes the point that a binary heap is not cache friendly, because it forces page faults for common operations (going “down” the tree). A binary heap is only cache friendly for horizontal operations (“find all siblings”).

        By using a B-Heap, the data structure decomposes into cache-friendly units automatically. So you’re right: you need to be aware of the cache…ideally by using data structures that decompose naturally into cache-friendly units (aka “cache-oblivious”)

        1. 4

          That’s reasonable. “cache oblivious” unfortunately conjures up thoughts of things like constant time AES that avoid lookup tables to avoid revealing cache hits and misses, which is a rather different but also vital constraint for certain algorithms.

          1. 3

            Cache oblivious algorithms have a fairly rich research history.


            1. 2

              Thanks. Based on that description, the b-heap is not cache oblivious at all. It takes an explicit cache (page) size parameter. The funnel heap in the paper, which I finally got around to skimming, would be cache oblivious, but also sounds like a lot more work to implement.

        2. 4

          I think cache-oblivious refers to them being aware of the memory hierarchy, but not needing to be tuned to a specific cache size. You can write algorithms that behave differently on CPUs that have differently sized L1, L2 caches, but it’s much simpler to write an algorithm that is aware that there could be arbitrary levels of caching, and will seem cache aware without needing tuning.

          Edit: in case it was unclear from my comment, I am arguing that we are agreeing violently.

      2. 2

        Heh! I love this, I wish more programmers woke to the possibilities of VM…

        For example, Varnish does not ignore the fact that memory is virtual; it actively exploits it. A 300-GB backing store, memory mapped on a machine with no more than 16 GB of RAM, is quite typical. The user paid for 64 bits of address space, and I am not afraid to use it.

        I once sped up a program by a full factor 40x by simply moving to mmap instead of using lseek.

        The TL;DR is probably….

        On a modern multi-issue CPU, running at some gigahertz clock frequency, the worst-case loss is almost 10 million instructions per VM page fault. If you are running with a rotating disk, the number is more like 100 million instructions.

        1. 1

          I found this post to be annoyingly smug. Considering cache and VM behavior in algorithm design is obviously a good thing—but far from being ignored by academic CS, there are entire sub-fields dedicated to cache-oblivious data structures, and more generally to adapting data structures and system designs to the properties of modern hardware. The “conceptual model of a computer” he presents is appropriate for CS101, but any reasonable CS education will go into far more depth about how modern hardware actually works, and what that implies for software design.