There was a period of about a decade when tries were one of my low-level data structure secret weapons. Constant lookup time for a given key length with none of the rebucketing issues of a hashtable, and for some reason few people seemed to have ever learned about them so they had the added bonus of being a cool new thing I could show people.
But then CPU caches got faster and faster, and low-level data structure performance started to be dominated by cache locality. “Constant number of instructions” and “constant execution time” became somewhat decoupled.
The last time I reached for tries was when I was on the infrastructure team at a large web company circa 2008. We were using memcached and I’d looked at profiling data and discovered that it was spending a nontrivial amount of time computing key hashes.
Aha, I thought, I know how to change this so no hashing is required, and it’ll be super quick to prototype! So I spent a couple hours replacing the hashtables with tries. The profiler confirmed that my implementation needed significantly fewer instructions to look up and insert values.
But it was 4-5x slower in wall time because it was jumping from cache miss to cache miss walking the trie instead of doing a hash computation completely in the L1 cache and only hitting main memory a few times to do the hashtable access. After that, I never touched tries again.
There was a period of about a decade when tries were one of my low-level data structure secret weapons.
Exactly a decade ago, I was assigned with the task of optimising a web crawler cluster. A python dictionary holding necessary state was the bottleneck, using up all the machine memory. I found a python PATRICIA trie implementation that implemented the dict interface. It was a simple matter of adding an import and calling the trie constructor instead of dict().
Memory usage was reduced by a 50 fold with that simple changes. He freed up three quarters of the machines that very same day.
You could make this Unicode aware by changing byte to rune.
You could keep it as a byte map but implement that as an array of 256 pointers to Tries. 2KB per node seems like a lot of overhead, but maybe it’s worth it?
A pretty common technique I have used is to have a node for every half byte and use slices, so at most 16 pointers per node. Then, you can use an integer to represent which slots are used (slot 0 used => bit 0 set, etc), and use bits.OnesCount16 to find out which of the branches have been populated. It’s essentially the technique used in ideal hash trees by Bagwell, just half the size, as it’s a little neater to work with for bytes.
I’m a little tired of the condescending attitude expressed online here and there about type-checking being an obviously superior way of writing code. Especially in toy exercises aimed at readability. I’m not taking seriously anyone who would claim that slapping field(default_factory=dict) on the children field here constitutes any improvement.
Ah that’s what you meant! I actually read the opposite into it: something in the vein of « don’t start testing me on data structures and typed languages as my knowledge is superior ». I was really confused about it, because it contrasted with using unannotated python, and also with the general tone of the article as a whole.
My 2 cents: it’s worth rephrasing as readers might leave with a wrong impression.
Thanks for the article, I found it very interesting!
This bit of descending down the tree used to be in a separate method, and I didn’t like checking for None there and here at the call site, so changed it to catching KeyError. And it remained this way after that method got inlined.
There was a period of about a decade when tries were one of my low-level data structure secret weapons. Constant lookup time for a given key length with none of the rebucketing issues of a hashtable, and for some reason few people seemed to have ever learned about them so they had the added bonus of being a cool new thing I could show people.
But then CPU caches got faster and faster, and low-level data structure performance started to be dominated by cache locality. “Constant number of instructions” and “constant execution time” became somewhat decoupled.
The last time I reached for tries was when I was on the infrastructure team at a large web company circa 2008. We were using memcached and I’d looked at profiling data and discovered that it was spending a nontrivial amount of time computing key hashes.
Aha, I thought, I know how to change this so no hashing is required, and it’ll be super quick to prototype! So I spent a couple hours replacing the hashtables with tries. The profiler confirmed that my implementation needed significantly fewer instructions to look up and insert values.
But it was 4-5x slower in wall time because it was jumping from cache miss to cache miss walking the trie instead of doing a hash computation completely in the L1 cache and only hitting main memory a few times to do the hashtable access. After that, I never touched tries again.
Exactly a decade ago, I was assigned with the task of optimising a web crawler cluster. A python dictionary holding necessary state was the bottleneck, using up all the machine memory. I found a python PATRICIA trie implementation that implemented the dict interface. It was a simple matter of adding an import and calling the trie constructor instead of dict(). Memory usage was reduced by a 50 fold with that simple changes. He freed up three quarters of the machines that very same day.
For what it’s worth, tries are still great for compression. For example, Pyroscope uses them to great effect
Nerd snipped myself into writing a Go version:
Thoughts:
byte
torune
.A pretty common technique I have used is to have a node for every half byte and use slices, so at most 16 pointers per node. Then, you can use an integer to represent which slots are used (slot 0 used => bit 0 set, etc), and use
bits.OnesCount16
to find out which of the branches have been populated. It’s essentially the technique used in ideal hash trees by Bagwell, just half the size, as it’s a little neater to work with for bytes.If you insist that words are just 26 English alphabetical letters, then you could use an array of 27 pointers, which is probably pretty cheap.
i think typically folks build byte trie over utf8 encoding, to keep number of children small. That’s how Unicode automata generally tend to work
t.children['\x00'] = nil
that’s a funny hack :-) Why not just use a boolean?I started with a boolean but then decided that a null character would be invalid anyway, so might as well use it.
My man!
What does this mean, @isagalaev?
I’m a little tired of the condescending attitude expressed online here and there about type-checking being an obviously superior way of writing code. Especially in toy exercises aimed at readability. I’m not taking seriously anyone who would claim that slapping
field(default_factory=dict)
on thechildren
field here constitutes any improvement.Ah that’s what you meant! I actually read the opposite into it: something in the vein of « don’t start testing me on data structures and typed languages as my knowledge is superior ». I was really confused about it, because it contrasted with using unannotated python, and also with the general tone of the article as a whole.
My 2 cents: it’s worth rephrasing as readers might leave with a wrong impression.
Thanks for the article, I found it very interesting!
? One line shorter, one indent less, we are being specific which expression can KeyError, and we still do only a single dict lookup
All true!
This bit of descending down the tree used to be in a separate method, and I didn’t like checking for
None
there and here at the call site, so changed it to catchingKeyError
. And it remained this way after that method got inlined.For a Ruby example of a Trie, Hanami’s router engine uses one.
There are bunch of trie routers for Go too. Here’s one in the Chi router package: https://github.com/go-chi/chi/blob/0316d5a1df8598eceb137f5f77945be56810b564/tree.go#L74