1. 27
  1.  

  2. 6

    I had a similar experience with strace… CPU load on some of our machines seemed very, very high for the type of traffic it was doing, so we ran strace on it. We found out that something was issuing a ton of write() calls of one byte each.

    Reexamining our code we learned that we were streaming a response to the client with requests.iter_content, which defaults to yielding one byte at a time. Bumping this to stream a page of memory at a time cut down on the box’s CPU load by half.

    Pretty cool, but glad I don’t have to use it too often.

    1. 2

      In the past 12 years of my professional Linuxing, strace and tcpdump have been pretty much the best tools to look at a misbehaving program. They almost always result in a clearer look at a symptom, and very often help me come up with an action for either better data or an immediate fix. To say they make me smarter is an understatement. They allow me to operate computers at all.

      My favorite previous uses of strace include:

      • Finding out which file a program can’t write to if it has no useful error messages. (Yes, this is a thing, sadly.)
      • Discovering why web requests that clock in at 10ms take seconds to yield a result to the browser. (Turns out if a single-threaded server accept()s all available incoming connections, any fast requests that get handled after this one slow request take a long time. This one also involved a fair amount of tcpdumping (-:)
      • Fixing a bug in a build system where creating an output file on an NFS directory caused that file to have a creation date in the future as observed from the machine creating the file.