1. 7
    1. 6

      I’ve written more than my fair share of IP stacks, usually for high speed packet capture.

      On one of these, every now and again, the stats would show that we were capturing at a line rate of terabits per second (or more), on devices with 100Mb Ethernet.

      I tore my hair out trying to reproduce the problem. Turned out every now and again a corrupted IP packet would come through with a header length field that was impossibly small. We’d drop the packet as corrupt and update the “invalid packets” counter, but we also updated the “mbps” counter to show that we still processed that amount of data (i.e. we handled it just fine, we weren’t overloaded, it’s the packet that was wrong), but we used the computed packet length to add to the stats…

      1. 4

        My favourite network bug (which, fortunately, I didn’t have to debug) was on our CHERI MIPS prototype. The CPU allowed loads to execute in speculation because loads never have side effects. It turns out that they do if they’re loading from a memory-mapped device FIFO. Sometimes, in speculation, the CPU would load 32 bits of data from the FIFO. The network stack would then detect the checksum mismatch and drop the packet. The sender would then adapt to higher packet loss and slow down. The user just saw very slow network traffic. Once this was understood, the CPU bug was fixed (only loads of cached memory are allowed in speculation). This was made extra fun by branch predictor aliasing, which meant that often the load would happen in speculation on a totally unrelated bit of code (and not the same bit)l

        Apparently the Xbox 360 had a special uncached load instruction with a similar bug and the only way to avoid problems was to not have that instruction anywhere in executable memory.

        1. 3