Looking at the benchmark code, the size of work they’re doing is small enough that clock resolution can become an issue. And while they aren’t measuring the time to generate maps, they’re leaving a lot of garbage on the process heap between runs which will also likely be a problem. It makes more sense to generate the map and keys once to avoid measuring all of these other effects.
Rewriting this benchmark locally with timer:tc/3 locally shows something closer to 7.9% overhead (with a null benchmark baseline that shows the looping takes roughtly 30% of the total time). It’s a wide enough difference from the posted article that I think it might be worth writing up a benchee example to get more accurate numbers.
Name ips average deviation median 99th %
Map.get 11.19 89.37 ms 8.63% 94.00 ms 110.00 ms
Access 9.88 101.24 ms 8.40% 94.00 ms 125.00 ms
Access 9.88 - 1.13x slower +11.87 ms
So the overhead seems much more substantial than this post claims but as all benchmarks go, it’s hard to represent real application code with trivial loops like this. Still, Access is far from free if you’re sure that your code path is monomorphic.
It would be also worth testing the code with larger maps as maps with <33 keys are stored as a sorted list instead of real “hash map”.
Thanks @strmpnk! I’m going to revisit this in a later blog post. I didn’t think about garbage I was putting on the heap when generating all the different maps. I’m going to take your benchmarking code as a starting point and also compare the performance of Map.get/2 to pattern matching on the map directly.