1. 39
  1. 8

    To me, this looks like a great example of the cascading complexity that results from a mismatch between limits imposed by an API (in this case a protocol) and actual hardware limits. If TCP had a 4 byte or 8 byte port range, this would presumably not be an issue. There would be no need to share ports.

    How can we design protocols and APIs that can adapt to increasing hardware capabilities? Perhaps we should always use variable-length integers? There presumably will always be some absurdly large limit that we can safely impose, e.g. the number of atoms in the observable universe (roughly 2^265, or 256 bits for a nice round number), so we don’t necessarily need to allow any arbitrary integer.

    1. 5

      We had one protocol that allowed more flexibility and it didn’t end well - ipv6. All the “flexible” bits are pretty much deprecated because it’s impossible to implement a fast router when you don’t know where the fields (like tcp ports) are.

      I wouldn’t drive conclusion that 16 bits for a port is limited.

      IMO the conclusion is that the BSD sockets api kinda requires for 2-tuple to be locked, while the user doesn’t want that. For connected socket we expect 4-tuple to be locked. It would be nice to have some API that can express that. For tcp we have IP_BIND_ADDRESS_NO_PORT, for udp we don’t have anything plus there is the overshadowing issue

      1. 3

        With IPv6 you don’t even need ports, you could just use the last N bytes of the address as the port :D

        More practically it might actually make sense to “expand” the ephemeral port range into the address, i.e. just use lots of different source addresses under the subnet your machine has.

        1. 3

          This is one of the reasons why it’s good to have a /64. One of the privacy options for IPv6 recommends that you keep a small number of stable IPv6 addresses for incoming connections and periodically cycle the one that you use for outbound connections. With a /64 and SLAAC you can to this more or less by picking a new 64-bit random number. In the most extreme configuration, you pick a new IPv6 address for every outbound connection. This means that a server shouldn’t be able to distinguish between two connections from different machines in the subnet and two from the same machine. This doesn’t help much for home networks (where everyone on the /64 is likely to be in the same family, at least), but is good for networks with more users.

          I believe this is very rarely done in practice because the higher-level protocols (for example, HTTP) provide tracking information (cookies, browser fingerprinting, and so on) that’s so much more useful than IP address that tracking based on IP is of fairly negligible value.

          1. 2

            This is done for DNS resolvers already, it got popular after the security issues around lack of space for entropy in the ID field. You route a /64 to the resolver and tell it to prefer IPv6 and to use that whole /64 as the outgoing source.

            See, eg, Unbound’s outgoing-interface setting.

        2. 7

          This is an interesting article!

          It seems like the “overshadowing” problem would be much better solved as a kernel patch. I doubt that many people desire that behavior, so a new sockopt would be quite useful.

          1. 3

            Take a look at the associated code: https://github.com/cloudflare/cloudflare-blog/tree/master/2022-02-connectx

            There is quite some value in the tests. These kind of things are notoriously hard to reason about, hard to get all the corner cases right, and hard to test.

            The killtw.py is a bit buggy though, there is one corner case around TIME-WAIT sockets I wasn’t able to figure out yet.