It’s my understanding that finger trees actually perform worse than their theoretical complexity on modern CPUs, because they don’t take advantage of cache locality.
IIUC, one reason the Hash Array-Mapped Trie (discovered by Bagwell, used in Clojure and Scala) is so fast is that each inner node is a contiguous array of pointers (either to child nodes or elements), so you can fit eight pointers a time in a cache line, and prefetchers will tend to get it right.
I remember hearing years ago from @headius that HotSpot had specific performance optimizations for singly-linked lists, but at the time I was unable to dig up any evidence, and he couldn’t remember where he’d read/heard it either.
It’s my understanding that finger trees actually perform worse than their theoretical complexity on modern CPUs, because they don’t take advantage of cache locality.
IIUC, one reason the Hash Array-Mapped Trie (discovered by Bagwell, used in Clojure and Scala) is so fast is that each inner node is a contiguous array of pointers (either to child nodes or elements), so you can fit eight pointers a time in a cache line, and prefetchers will tend to get it right.
I remember hearing years ago from @headius that HotSpot had specific performance optimizations for singly-linked lists, but at the time I was unable to dig up any evidence, and he couldn’t remember where he’d read/heard it either.