1. 24
  1.  

  2. 3

    This is impressive!

    When I saw the title I was half expecting the author to be kellabyte. :)

    One thing I wonder about because I didn’t see it mentioned is using PGO on the code in user space? Marc mentions -O3 and -march compiler flags but I didn’t notice PGO being mentioned.

    Another thing that I’d wonder about is checking that memory allocations are always on the same NUMA node as the CPU core for a given thread. Likely to be a wild goose chase because I believe Linux tries to get this right for you automatically but it could be worth looking at?

    It’s neat watching how, as the throughput goes up and up, the bar for what constitutes an optimisation worth trying creeps down from microseconds towards nanoseconds. e.g. it would be rather silly to try something like disabling dhclient to get rid of its raw socket if I’m running some API service that only gets a few dozen req/s anyway.

    1. 2

      “The next optimization is both significant and controversial: disabling speculative execution mitigations in the Linux kernel. Now, before you run and get your torches and pitchforks, first take a deep breath and slowly count to ten. Performance is the name of the game in this experiment, and as it turns out these mitigations have a big performance impact when you are trying to make millions of syscalls per second.”

      Here is one highly optimized word: No.

      1. 2

        If it’s an EC2 instance running a single app server, the risk is minimal as he explains.

        1. 2

          Honest question, if you were running this server on a dedicated server, wouldn’t turning off those speculative execution mitigations be a good thing? In the author’s case since he’s on AWS it may not be super ok but on my own actual hardware? I thought it would be fine.

        2. 2

          For the implementation, I used a simple API server built with libreactor, an event-driven application framework written in C

          Sounded interesting up till this part. Kind of an unrealistic scenario for me. Hardly anything will be so in need of speed that people will forgo a) what they already use and b) what makes sense. A C web framework I’ve never heard of? Hard pass.

          1. 1

            Most of the optimizations he does are likely doable for other servers, and many of those servers had comparable performance in the initial pre-optimized comparison.