1. 53
  1.  

  2. 14

    While it might be true that 1500 Bytes is now the de facto MTU standard on the Internet (minus whatever overhead you throw at it), everything’s not lost. The problem is not that we don’t have the link layer capabilities to offer larger MTUs, the problem is that the transport protocol has to be AWARE of it. One mechanism for finding out whether what size MTU is supported by a path over the Internet is an Algorithm called DPLPMTUD. It is currently being standardized by the IETF and is more or less complete https://tools.ietf.org/html/draft-ietf-tsvwg-datagram-plpmtud-14. There are even plans for QUIC to implement this algorithm, so if we’ll end up with a transport that is widely deployed and also supports detection of MTUs > 1500 we’ll actually might have a chance to change the link layer defaults. Fun fact: All of the 4G networking gear actually supports jumbo frames, most of the providers just haven’t enabled the support for it since they are not aware of the issue.

    1. 6

      Wow, it might even work.

      I can hardly believe it… but if speedtest.net were able to send jumbo frames and most users’ browsers support receiving it, it might get deployed by ISPs as they look for benchmark karma. Amazing. I thought 1500 was as invariant as π.

      1. 5

        I was maintainer for an AS at a previous job and set up a few BGP peers with jumbo frames (4470). I would have made this available on the customer links as well, except none of them would have been able to receive the frames. They were all configured for 1500, as is the default in any OS then or today. Many of their NICs couldn’t handle 4470 either, though I suppose that has improved now.

        Even if a customer had configured their NIC to handle jumbo frames, they would have had problems with the other equipment on their local network. How do you change the MTU of your smartphone, your media box or your printer? If you set the MTU on your Ethernet interface to 4470 then your network stack is going to think it can send such large frames to any node on the same link. Path MTU discovery doesn’t fix this because there is no router in between that can send ICMP packets back to you, only L2 switches.

        It is easy to test. Try to ping your gateway with ping -s 4000 192.168.0.1 (or whatever your gateway is). Then change your MTU with something like ip link set eth0 mtu 4470 and see if you can still ping your gateway. Remember to run ip link set eth0 mtu 1500 afterwards (or reboot).

        I don’t think that DPLPMTUD will fix this situation and let everyone have jumbo frames. As a former network administrator reading the following paragraph, they are basically saying that jumbo frames would break my network in subtle and hard to diagnose ways:

           A PL that does not acknowledge data reception (e.g., UDP and UDP-
           Lite) is unable itself to detect when the packets that it sends are
           discarded because their size is greater than the actual PMTU.  These
           PLs need to rely on an application protocol to detect this loss.
        

        So you’re going to have people complain that their browser is working, but nothing else. I wouldn’t enable jumbo frames if DPLPMTUD was everything that was promised as a fix. That said, it looks like DPLPMTUD will be good for the Internet as a whole, but it does not really help the argument for jumbo frames.

        And I don’t know if it has changed recently, but the main argument for jumbo frames at the time was actually that they would lead to fewer interrupts per second. There is some overhead per processed packet, but this has mostly been fixed in hardware now. The big routers use custom hardware that handles routing at wire speed and even consumer network cards have UDP and TCP segmentation offloading, and the drivers are not limited to one packet per interrupt. So it’s not that much of a problem anymore.

        Would have been cool though and I really wanted to use it, just like I wanted to get us on the Mbone. But at least we got IPv6. Sorta. :)

        1. 3

          If your system is set up with an mtu of 1500, then you’re already going to have to perform link mtu discovery to talk with anyone using PPPoE. Like, for example, my home DSL service.

          Back when I tried running an email server on there, I actually did run into trouble with this, because some bank’s firewall blocked ICMP packets, so… I thought you’d like to know, neither of us used “jumbo” datagrams, but we still had MTU trouble, because their mail server tried to send 1500 octet packets and couldn’t detect that the DSL link couldn’t carry them. The connection timed out every time.

          If your application can’t track a window of viable datagram sizes, then your application is simply wrong.

          1. 2

            If your system is set up with an mtu of 1500, then you’re already going to have to perform link mtu discovery to talk with anyone using PPPoE. Like, for example, my home DSL service.

            It’s even worse: in the current situation[1], your system’s MTU won’t matter at all. Most of the network operators are straight-up MSS-clamping your TCP packets downstream, effectively discarding your system’s MTU.

            I’m very excited by this draft! Not only it will fix the UDP situation we currently have, but will also make tunneling connections way more easy. That said, it also means that if we want to benefit from that, the network administrators will need to quit mss-clamping. I suspect this to take quite some time :(

            [1] PMTU won’t work in many cases. Currently, you need ICMP to perform a PMTU discovery, which is sadly filtered out by some poorly-configured endpoints. Try to ping netflix.com for instance ;)

            1. 2

              If your system is set up with an mtu of 1500, then you’re already going to have to perform link mtu discovery to talk with anyone using PPPoE. Like, for example, my home DSL service.

              Very true, one can’t assume an MTU of 1500 on the Internet. I disagree that it’s on the application to handle it:

              If your application can’t track a window of viable datagram sizes, then your application is simply wrong.

              The network stack is responsible for PMTUD, not the application. One can’t expect every application to track the datagram size on a TCP connection. Applications that use BSD sockets simply don’t do that, they send() and recv() and let the network stack figure out the datagram size. There’s nothing wrong with that. For UDP the situation is a little different, but IP can actually fragment large UDP datagrams and PTMUD works there too (unless, again, broken by bad configurations, hence DPLPMTUD).

              1. 3

                I disagree that it’s on the application to handle it

                Sure, fine. It’s the transport layer’s job to handle it. Just as long as it’s detected at the endpoints.

                For UDP the situation is a little different, but IP can actually fragment large UDP datagrams and PTMUD works there too

                It doesn’t seem like anyone likes IP fragmentation.

                • If you’re doing a teleconferencing app, or something similarly latency-sensitive, then you cannot afford the overhead of reconstructing fragmented packets; your whole purpose in using UDP was to avoid overhead.

                • If you’re building your own reliable transport layer, like uTP or QUIC, then you already have a sliding size window facility; IP fragmentation is just a redundant mechanism that adds overhead.

                • Even DNS, which seems like it ought to be a perfect use case for DNS with packet fragmentation, doesn’t seem to work well with it in practice, and it’s being phased out in favour of just running it over TCP whenever the payload is too big. Something about it acting as a DDoS amplification mechanism, and super-unreliable on top of that.

                If you’re using TCP, or any of its clones, of course this ought to be handled by the underlying stack. They promised reliable delivery with some overhead, they should deliver on it. I kind of assumed that the “subtle breakage” that @weinholt was talking about was specifically for applications that used raw packets (like the given example of ping).

                1. 1

                  You list good reasons to avoid IP fragmentation with UDP and in practice people don’t use or advocate IP fragmentation for UDP. Broken PMTUD affects everyone… ever had an SSH session that works fine until you try to list a large directory? Chances are the packets were small enough to fit in the MTU until you listed that directory. As breakages go, that one’s not too hard to figure out. The nice thing about the suggested MTU discovery method is that it will not rely on other types of packets than those already used by the application, so it should be immune to the kind of operator who filters everything he does not understand. But it does mean some applications will need to help the network layer prevent breakage, so IMHO it doesn’t make jumbo frames more likely to become a thing. It’s also a band-aid on an otherwise broken configuration, so I think we’ll see more broken configurations in the future, with less arguments to use on the operators who can now point to how everything is “working”.

        2. 3

          This is another fine example of how the complexity of modern systems is reaching a point where very few if any can truly understand everything that’s happening in a system all the way down.

          Makes me miss the old days of 8 bit micro-computers when you could learn everything there was to learn about a machine from a thin paperback book :)

          1. 2

            At this point I think someone’s reading every comment I post and marking it as spam out of spite.

            Clearly I’m not ENTIRELY a horrible spammer, because the community has rewarded my participation fairly extensively in other ways, so I’m gonna ignore it as someone mis-using the tools.

            If you’re the person dinging me as a spammer and would like to talk about some aspect of my comments you’d like to see change, please drop me a private message and we’ll talk about it.

            1. 1

              If you don’t bother, you’ll never learn.