1. 4
  1. 4

    Comparing the TCP/IP stack to something that throws out congestion control, reordering and flow control is somewhat strange. Maybe I missed something but you may as well just be using UDP(lite).

    More interesting would be comparing this to:

    I am more interested in the answer to is the “juice is worth the squeeze” compared to other less intrusive and specialist methods that are actually accessible to others.

    I have not seen many, if any, articles that cover this. Do most then just in light finding their solution is slow just blame the kernel rather than looking and experimenting with the other readily available baked in options?

    1. 3

      Next, we repeat our performance runs using netperf to measure the latency. We measure the 50th, 90th and 99th percentile of the latency varying the send and receive buffer (message) sizes. In our results, we use the median latency to clearly show the trendlines.

      This seems highly suspicious to me. Tail latencies tend to dominate total system latency. Why is not the 99.99th percentile shown here instead? Or even the maximum?

      By showing the 50th percentile, they are effectively leaving out all the bad results. Half of the results were worse than the ones shown – what are they hiding?

      There may be good reasons they’re showing just P50, but I’d want to see the full distribution to be able to judge for myself whether that is a warranted choice.

      1. 1

        ^ this. Distribution statistics should only be used when it would be prohibitively expensive to store all of the data.

        Show all of the data! Don’t just show summary statistics! You can overlay summary statistics, but don’t throw away data!