I wrote this on Saturday morning, primarily to highlight some common threading pitfalls and cool stuff that can be done with ‘perf’.
I’ve written a bunch of stuff over the last few years at http://brooker.co.za/blog/, and have received great feedback and criticism from others. For this one, though, my email has been a fairly constant stream of vitriol since yesterday. I even got accused of being a “shill for Node.js”. It’s all easy enough to ignore, but I found it interesting how touchy this subject is.
I think extrapolating from “my program got slower with threads (and there was an easy fix)” to “most programs get slower with threads” is quite the leap.
Multi-threaded programs can, and very often do, run much more slowly than the equivalent single-threaded program.
The point that I was trying to make is that Amdahl’s law gives us the wrong intuition about the performance of multi-threaded programs. The worst case of Amdahl’s law is a wash: the multi-threaded code runs in the same time as the equivalent single-threaded code. Unfortunately, that doesn’t match reality. In the real world, poorly written (or poorly optimized) multi-threaded code runs slower than the equivalent single-threaded program.
That doesn’t mean that threads are bad, just that they aren’t a magic ointment that makes all programs faster. If there is contention, especially if it’s contention that requires a context switch, they can make code go slower. Sometimes shockingly so.
The second think I was trying to talk about is how modern Linux has some very cool tools for tracking down these kinds of performance problems. perf (or perf-events) is extremely powerful, and combines a lot of what you get from profilers and strace into one package with much less runtime overhead. In addition, its ability to do system-wide profiling is very handy for cross-process interactions. Many other operating systems have equivalent, and some better, tools.
In the real world, poorly written (or poorly optimized) multi-threaded code runs slower than the equivalent single-threaded program.
I’ve done a lot of concurrent programming over the last four years, and this has almost never been my experience working with Erlang, Clojure, and java.util.concurrent, but YMMV I guess. I tend to see sublinear scaling owing to unexpected locks in the stdlib (hi Integer.parseInt), or known synchronization points like interning and queue contention, but I don’t think I’ve ever hit a real-world computational problem where I haven’t been able to get a ~4-10x speedup out of a 16-core box by slapping a threadpool on it. Usually takes a profiler to get to that near-linear domain though.
It was harder to get useful behavior out of multithreading in the bad old C/C++ days where there was heavy reliance on out-of-process locks. People know how to do things better than lock-spaghetti now.
I think the point is more: “it is easy to get multi-threading wrong and hurt performance in ways you may not expect if you’re unfamiliar with how multi-threading works.”
I wrote this on Saturday morning, primarily to highlight some common threading pitfalls and cool stuff that can be done with ‘perf’.
I’ve written a bunch of stuff over the last few years at http://brooker.co.za/blog/, and have received great feedback and criticism from others. For this one, though, my email has been a fairly constant stream of vitriol since yesterday. I even got accused of being a “shill for Node.js”. It’s all easy enough to ignore, but I found it interesting how touchy this subject is.
It’s remarkable how terrible people can be at times.
Thank you for writing this. I didn’t know about
perf
!I think extrapolating from “my program got slower with threads (and there was an easy fix)” to “most programs get slower with threads” is quite the leap.
The point that I was trying to make is that Amdahl’s law gives us the wrong intuition about the performance of multi-threaded programs. The worst case of Amdahl’s law is a wash: the multi-threaded code runs in the same time as the equivalent single-threaded code. Unfortunately, that doesn’t match reality. In the real world, poorly written (or poorly optimized) multi-threaded code runs slower than the equivalent single-threaded program.
That doesn’t mean that threads are bad, just that they aren’t a magic ointment that makes all programs faster. If there is contention, especially if it’s contention that requires a context switch, they can make code go slower. Sometimes shockingly so.
The second think I was trying to talk about is how modern Linux has some very cool tools for tracking down these kinds of performance problems. perf (or perf-events) is extremely powerful, and combines a lot of what you get from profilers and strace into one package with much less runtime overhead. In addition, its ability to do system-wide profiling is very handy for cross-process interactions. Many other operating systems have equivalent, and some better, tools.
I’ve done a lot of concurrent programming over the last four years, and this has almost never been my experience working with Erlang, Clojure, and java.util.concurrent, but YMMV I guess. I tend to see sublinear scaling owing to unexpected locks in the stdlib (hi Integer.parseInt), or known synchronization points like interning and queue contention, but I don’t think I’ve ever hit a real-world computational problem where I haven’t been able to get a ~4-10x speedup out of a 16-core box by slapping a threadpool on it. Usually takes a profiler to get to that near-linear domain though.
It was harder to get useful behavior out of multithreading in the bad old C/C++ days where there was heavy reliance on out-of-process locks. People know how to do things better than lock-spaghetti now.
I think the point is more: “it is easy to get multi-threading wrong and hurt performance in ways you may not expect if you’re unfamiliar with how multi-threading works.”