This isn’t as impressive as the author makes it sound.
First of all, there isn’t one single std::sort. The standard specifies it must be O(n log n), but doesn’t mandate a specific algorithm. Each compiler or stl vendor writes their own std::sort, so to say, “This is faster than std::sort” doesn’t mean much without saying which implementation.
Second, this “new” algorithm appears to be just a wrapper around radix sort, counting sort, and std::sort, with some logic to choose between them. It might be convenient, but it’s already well known that specialized (even O(n)) sorting routines can be written for specific data types.
Finally, the interface isn’t nearly as flexible as std::sort. std::sort works for any type with operator<, and allows passing in custom comparison functions. This function doesn’t allow custom comparison functions, and appears to only sort values that it knows how to convert to integers.
All three points of those points are addressed in the article or the author’s comments on the same page.
I thought the “Why This Is Faster” section’s descent into instruction level parallelism was fun. I didn’t know about perf’s feature to measure instructions per cycle.
Of course, yeah, the performance of this algorithm vs. American Flag Sort will vary based on CPU…
I think the innovation here is that radix sort can be optimized (using instruction level parallelism, no less) to outperform standard sort starting from 128 items. “Everybody” knew radix sort can be faster for integers, but even for integers switchover point was usually above thousands.
What I like most about this article is the practical approach. The OP uses real code to illustrate their points, not symbols or pseudo-code. This made it very understandable to me.
This isn’t as impressive as the author makes it sound.
First of all, there isn’t one single std::sort. The standard specifies it must be O(n log n), but doesn’t mandate a specific algorithm. Each compiler or stl vendor writes their own std::sort, so to say, “This is faster than std::sort” doesn’t mean much without saying which implementation.
Second, this “new” algorithm appears to be just a wrapper around radix sort, counting sort, and std::sort, with some logic to choose between them. It might be convenient, but it’s already well known that specialized (even O(n)) sorting routines can be written for specific data types.
Finally, the interface isn’t nearly as flexible as std::sort. std::sort works for any type with operator<, and allows passing in custom comparison functions. This function doesn’t allow custom comparison functions, and appears to only sort values that it knows how to convert to integers.
All three points of those points are addressed in the article or the author’s comments on the same page.
I thought the “Why This Is Faster” section’s descent into instruction level parallelism was fun. I didn’t know about perf’s feature to measure instructions per cycle.
Of course, yeah, the performance of this algorithm vs. American Flag Sort will vary based on CPU…
I think the innovation here is that radix sort can be optimized (using instruction level parallelism, no less) to outperform standard sort starting from 128 items. “Everybody” knew radix sort can be faster for integers, but even for integers switchover point was usually above thousands.
What I like most about this article is the practical approach. The OP uses real code to illustrate their points, not symbols or pseudo-code. This made it very understandable to me.