I wrote this on Saturday morning, primarily to highlight some common threading pitfalls and cool stuff that can be done with ‘perf’.
I’ve written a bunch of stuff over the last few years at http://brooker.co.za/blog/, and have received great feedback and criticism from others. For this one, though, my email has been a fairly constant stream of vitriol since yesterday. I even got accused of being a “shill for Node.js”. It’s all easy enough to ignore, but I found it interesting how touchy this subject is.
It’s remarkable how terrible people can be at times.
Thank you for writing this. I didn’t know about perf!
I think extrapolating from “my program got slower with threads (and there was an easy fix)” to “most programs get slower with threads” is quite the leap.
I think the point is more: “it is easy to get multi-threading wrong and hurt performance in ways you may not expect if you’re unfamiliar with how multi-threading works.”
Multi-threaded programs can, and very often do, run much more slowly than the equivalent single-threaded program.
The point that I was trying to make is that Amdahl’s law gives us the wrong intuition about the performance of multi-threaded programs. The worst case of Amdahl’s law is a wash: the multi-threaded code runs in the same time as the equivalent single-threaded code. Unfortunately, that doesn’t match reality. In the real world, poorly written (or poorly optimized) multi-threaded code runs slower than the equivalent single-threaded program.
That doesn’t mean that threads are bad, just that they aren’t a magic ointment that makes all programs faster. If there is contention, especially if it’s contention that requires a context switch, they can make code go slower. Sometimes shockingly so.
The second think I was trying to talk about is how modern Linux has some very cool tools for tracking down these kinds of performance problems. perf (or perf-events) is extremely powerful, and combines a lot of what you get from profilers and strace into one package with much less runtime overhead. In addition, its ability to do system-wide profiling is very handy for cross-process interactions. Many other operating systems have equivalent, and some better, tools.
In the real world, poorly written (or poorly optimized) multi-threaded code runs slower than the equivalent single-threaded program.
I’ve done a lot of concurrent programming over the last four years, and this has almost never been my experience working with Erlang, Clojure, and java.util.concurrent, but YMMV I guess. I tend to see sublinear scaling owing to unexpected locks in the stdlib (hi Integer.parseInt), or known synchronization points like interning and queue contention, but I don’t think I’ve ever hit a real-world computational problem where I haven’t been able to get a ~4-10x speedup out of a 16-core box by slapping a threadpool on it. Usually takes a profiler to get to that near-linear domain though.
It was harder to get useful behavior out of multithreading in the bad old C/C++ days where there was heavy reliance on out-of-process locks. People know how to do things better than lock-spaghetti now.