A bit late to the party, but I feel like some of the conclusions here are part of the reason Joyent Manta came about. The example where just using standard unix tools on a single machine, without any multithreading was significantly faster than running the task across a large cluster, even with a moderately large dataset, highlighted to me why single threaded performance is still important; throwing money at making clusters larger is just helping boil the oceans with very little real gain.
I haven’t checked the video but I think this is the one I’m referring to: https://www.youtube.com/watch?v=79fvDDPaIoY
Previous discussion: https://lobste.rs/s/z5fcjz/scalability_at_what_cost_2015
Ah, thanks, I missed it.
https://github.com/frankmcsherry/blog/blob/master/posts/2015-01-15.md
This paper never gets old but it’s been covered before.
Also this: https://blog.acolyer.org/2015/06/05/scalability-but-at-what-cost/
A bit late to the party, but I feel like some of the conclusions here are part of the reason Joyent Manta came about. The example where just using standard unix tools on a single machine, without any multithreading was significantly faster than running the task across a large cluster, even with a moderately large dataset, highlighted to me why single threaded performance is still important; throwing money at making clusters larger is just helping boil the oceans with very little real gain. I haven’t checked the video but I think this is the one I’m referring to: https://www.youtube.com/watch?v=79fvDDPaIoY