I had a similar experience with strace… CPU load on some of our machines seemed very, very high for the type of traffic it was doing, so we ran strace on it. We found out that something was issuing a ton of write() calls of one byte each.
Reexamining our code we learned that we were streaming a response to the client with requests.iter_content, which defaults to yielding one byte at a time. Bumping this to stream a page of memory at a time cut down on the box’s CPU load by half.
Pretty cool, but glad I don’t have to use it too often.
In the past 12 years of my professional Linuxing, strace and tcpdump have been pretty much the best tools to look at a misbehaving program. They almost always result in a clearer look at a symptom, and very often help me come up with an action for either better data or an immediate fix. To say they make me smarter is an understatement. They allow me to operate computers at all.
My favorite previous uses of strace include: