Title is correct at the source url, shouldn’t we fix this one?
Wow, I never realized garbage collectors were so bad at cleaning up many small objects.
There are some amazingly succinct algorithms that generate an inordinate amount of garbage, the proper garbage collector / memory subsystem could turn it into a bump allocator on call site specific memory pools.
I did a quick document similarity hack using jacquard similarity over n-grams and it used gigabytes of garbage. The working set was tiny but the trailing standing wave was massive. If I did it again I’d use hyperloglog or bloom filters instead.