1. 17

What is your favorite data structure? How did it become your favorite? Is it just that efficient? Do you use it a lot? Is there a specific time you used it that you really enjoyed it?

  1.  

  2. 8

    I like zippers, all kinds of zippers. Besides being super useful, I’ve found them to make an excellent introductory material for teaching algebraic data types, and I like that they solve an unintuitive problem (maintaining an index into an arbitrary recursive data structure) in a very simple manner. Their relationship to one-hole contexts and the derivative of regular datatypes is also pretty fascinating.

    1. 8

      The Trie: Simple, undervalued, like a swiss army knife ;-)

      1. 2

        I considered responding with trie. It’s a very clever data type, and it’s fun to debate how to pronounce it.

        But it’s also, due to the way we pre-fetch memory nowadays, very often a much slower choice than using something simpler. I like it as a reminder that the abstractions we use aren’t always the best model for our actual work environment.

        1. 2

          The performance remark is completely true and justified; but considering the range of possibilities you have with using a trie it’s a sort of “programmers swiss army knife of datatypes”. And considering I do most of my work in Ruby: Performance is an issue anyway ;-)

      2. 5

        The basic ones that people don’t use enough, list, set, map.
        Too many times I see programmers using:
        An array and then frequently deleting and compressing/shuffling to remove the gaps
        A list with items that have unique key/value pairs
        A list that should only contain unique items
        A map with items that have no value or a pointless value such as a bool that is never read

        1. 5

          Bloom filters and Hyperloglog counters. I find probabilistic data structures fascinating.

          1. 4

            Mine is the quadtree. I remember “first” learning about them at a hackathon. Someone implemented a quadtree within WebGL for a simple cube game and I thought it was the wildest thing. Since then I’ve spent more time learning about them and even implemented a few in production at my current posting. Quadtrees, like most data structures, are situational though so I don’t always get the joy of playing with them. I also like that the same principles can be taken and applied to an octree for three dimensional space.

            1. 6

              Heh! Mine is the R* tree, which, like the quadtree, is a spatial index, but it uses adaptive regions that may overlap, in contrast to the quadtree’s exhaustive partition. This gives it some stronger guarantees about worst-case performance, at the cost of a bunch of complexity such as rebalancing the tree dynamically.

              Today there’s a nice Wikipedia page about it, but there wasn’t when I first learned (I’m not sure how long ago it was). It was one of my first experiences with implementing an algorithm directly from the papers about it, so I remember it fondly. :)

              1. 3

                R* tree is mine as well for very similar reasons :)

            2. 3

              Slide buffer.

              Let’s face it, we need contiguous chunks of space, otherwise we have to do hideously complex things managing the wrap arounds or splits….

              So have a ringbuffer, except instead of wrapping when full, you slide the data along.

              Now if you are consuming more or less as fast as you are producing, the slide is often a no-op update the head and tail pointers to the front of the buffer.

              1. 1

                This sounds interesting, but I’m having trouble finding a more detailed description/example. Do you have a URL you could share?

                1. 2

                  It’s basically a very conceptually simple non-blocking producer consumer thread communication mechanism…

                  typedef struct {
                     uint16_t sizeInBytes;
                     uint16_t back;
                     uint16_t front;
                  
                     bool_t  lock;
                     bool_t  eof; // Allow producer to signal to consumer that no more data is coming... ever.
                     uint8_t * data;
                  } slideBuffer_t ;
                  

                  I’m moving to gcc 4.9 from 4.2 and lo and behold, I have _atomic builtins, so I’m trying to port the whole thing to use atomics rather than heavier things.

                  It’s “non-blocking” in the sense the producer “owns” the front of the buffer, and can do what he likes with it, and the consumer owns the back.

                  The consumer can consume whatever is between the back and front. If there isn’t anything there, he just sees “no data pending”.

                  If the producer runs out of space, but there is no space at the front, he has to initiate a slide.

                  While the consumer is busy consuming the data, he “locks” the data structure until it is consumer, so the producer can’t slide it.

                  A higher layer is responsible for informing that producer and the consumer that the slide buffers state may have changed and it is worth trying to see if it can read or write now.

              2. 1

                I find the Trie to be an awesome data structure, although I kind of like HyperLogLog too.