1. 45
    1. 15

      The paper is also available on arxiv: https://arxiv.org/abs/2310.09423

      1. 13

        Kernels and hypervisors have fast paths for offloading TCP, but not for UDP, since historically this hasn’t been important.

        1. 4

          It’s interesting to see in the wild how much of a difference it makes, too!

          1. 2

            Years back, hypervisors (looking specifically at you, VMWare) would silently discard UDP traffic anytime there was any load on the machine. The rest of the time, when it actually deigned to deliver the packets, it was apparently all done in software, and not that efficiently.

            This was for high scale UDP-based clustering software (Tangosol) used by most of the big websites and banks. At the time, we actually had to build an alternative clustering implementation over TCP/IP because of this.

          2. 11

            Abstract of the paper:

            QUIC is expected to be a game-changer in improving web application performance. In this paper, we conduct a systematic examination of QUIC’s performance over high-speed networks. We find that over fast Internet, the UDP+QUIC+HTTP/3 stack suffers a data rate reduction of up to 45.2% compared to the TCP+TLS+HTTP/2 counterpart. Moreover, the performance gap between QUIC and HTTP/2 grows as the underlying bandwidth increases. We observe this issue on lightweight data transfer clients and major web browsers (Chrome, Edge, Firefox, Opera), on different hosts (desktop, mobile), and over diverse networks (wired broadband, cellular). It affects not only file transfers, but also various applications such as video streaming (up to 9.8% video bitrate reduction) and web browsing. Through rigorous packet trace analysis and kernel- and user-space profiling, we identify the root cause to be high receiver-side processing overhead, in particular, excessive data packets and QUIC’s user-space ACKs. We make concrete recommendations for mitigating the observed performance issues.

            Relevant bits from the introduction:

            We identify two main root causes of QUIC’s poor receiver-side performance.

            1. When downloading the same file, the in-kernel UDP stack issues much more packet reads (netif_receive_skb) than TCP, leading to a significantly higher CPU usage. This is because none of the QUIC implementations we examine uses UDP generic receive offload (GRO) where the link layer module combines multiple received UDP datagrams into a mega data- gram before passing it to the transport layer. This is in sharp contrast to the wide deployment of TCP segmentation offload, and recent advocacy of UDP send-side offload (GSO).

            2. In the user space, QUIC incurs a higher overhead when processing received packets and generating responses. This overhead can be attributed to multiple factors: the excessive packets passed from the kernel (Issue 1), the user-space nature of QUIC ACKs, and the lack of certain optimizations such as delayed ACK in QUIC

            We make several recommendations for mitigating the above impact, including deploying UDP GRO on the receiver side, making generic offloading solutions (GSO and GRO) more QUIC-friendly, improving relevant QUIC logic on the receiver side, and using multiple CPU cores to receive data for QUIC.

            From a distance, for a non-expert, it looks like the issue reported are more related to the software stack than to the protocol itself.

            1. 10

              From a distance, for a non-expert, it looks like the issue reported are more related to the software stack than to the protocol itself.

              A lot of critiques of QUIC look a bit like this. You’ll find other criticisms that say QUIC doesn’t work with load balancers, when QUIC was explicitly designed to work with load balancers but needs them to be modified. This one is a bit more helpful because it contains suggestions for fixes.

              Some SmartNICs can already offload most of QUIC. The userspace implementations are nice for prototyping and rapid deployment and, for most cases, absolutely fine. The remaining cases, I suspect, will eventually also avoid the kernel and have SmartNICs talking directly to userspace.

            2. 3

              They define “Fast Internet” as “>500 Mbps”. So good fiber, not something exotic.

              1. 3

                It would now be interesting to see a comparison against TCP+TLS+HTTP/1.1 under these circumstances…

                1. 4

                  According to the paper’s conclusion, you should see similar results that HTTP/2, given that it seems most of the reported QUIC’s issues is that there is too much kernel<->userland crossings & not using available kernel’s UDP optimizations, whereas TCP stacks are all already quite fast and mostly contained to the kernel.

                  1. 1

                    That makes sense, thank you!

                2. 2

                  I really like such studies. I’ve seen it so often that people switch protocols expecting it to be faster, because that’s what they were made for, when in the real world they slow down applications. HTTP/1 can be faster than HTTP/2 and it’s not that uncommon actually. And often enough that’s something that could have been tested quite easily. People see these examples with hundreds of requests, but that rarely happens in reality.

                  It’s even worse when there are demo sites, and then one disables HTTP/2 and the HTTP/2 example is still loading faster.

                  Other examples can be HTTP 1.1 pipe-lining giving the same benefits that HTTP/2 gives for certain use cases.

                  Anyways, great to see someone looks at real life scenarios, rather than just going for either theory or artificial testing.

                  1. 2

                    I’d like to see a similarly thorough comparison of TCP+TLS+HTTP/2 and TCP+TLS+HTTP 1.1. I’ve personally experienced a pretty drastic performance reduction with HTTP/2 compared to 1.1, that was similar compared to the paper here. It was bad enough that I had to patch what I was using to not upgrade connections to HTTP/2.