A TCP/IP connection is identified by a four element tuple: {source IP, source port, destination IP, destination port}.
IP is an OSI layer 3 (network) protocol, and IP connections are (indeed) identified by this 4-tuple.
TCP is an OSI layer 4 (transport) protocol, and TCP connections are (technically) identified by a 5-tuple: the 4-tuple from the IP, plus a protocol field, which is always gonna be TCP for, er, TCP.
That’s because you can, theoretically, mux different transport-layer protocols over a single network-layer identifier. For example, you can run a UDP/IP server and a TCP/IP server both on localhost:1234. This isn’t a huge deal in practice, because most server software will bind to a SOCK_STREAM socket by default, and most client software will assume a bare host:port should use SOCK_STREAM as well.
The example code demonstrates this:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# Let the source address be 192.168.1.21:1234
s.bind(("192.168.1.21", 1234))
s.connect(("www.google.com", 80))
The socket is constructed with AF_INET i.e. IP, and SOCK_STREAM i.e. TCP. So the protocol part is kind of baked-in. Calling bind or connect uses the host and port provided to the function, and takes the protocol from the existing socket.
—
If you want to establish more than 64k (ephemeral port range) connections to a single destination, you need to use all the tricks:
Oof! Sockets and connections aren’t free, especially when they traverse the network (c.f. local Unix domain sockets). Bind-before-connect can be useful in very specific use cases, but if your application is making more than a handful of physical (socket) connections to a single destination, it’s usually a bug or design error.
I think you have the layers mixed up a bit conceptually. IP addresses are an IP concept. Ports are a concept that is shared by the two common transport protocols of TCP and UDP.
I hope I don’t have the concepts mixed up, I’m (supposed to be) a network engineer! 🙃 I also agree with the things you said. Can you point out what you think I got wrong?
Conceptually though the 4-tuple/5-tuple abstraction is still very useful in practice, especially given that the OSI model is generally… well, it’s a model too, and if I’m remembering my history correctly, it was actually built in parallel to the IPv4 stack that won. The 7-layer model was meant to be used to build a protocol stack that actually implemented all 7 layers, but TCP/IP “won” and OSI lived on in eternity as a model and not as an implementation.
Reasonable people may disagree, but I don’t think the OSI layer model and TCP/IP are, like, mutually exclusive to each other. There’s some squidgy-ness in the details: IP is pretty clearly an OSI layer 3 thing, and TCP is layer 4 and maybe 5 depending on how you look at it. So there’s no bijective mapping from the one to the other. But that’s fine: all models are wrong, some models are useful, and I think the OSI model continues to be useful :)
I don’t think he mixed anything up. TCP connections are identified by the 5-tuple. The three fields that are part of IP are the source and destination (IP) address and the protocol. The source and destination port are part of the TCP or UDP header. As he says, this means that you can use the same port for both UDP and TCP, because they are separate namespaces (though, again, as he says) a lot of software somewhat conflates them. This conflation is even more likely with things like HTTP, which now run over either TLS+TCP or QUIC+UDP and so want to use the same port on the server for both TCP and UDP.
The trick I think is that the 5-tuple is an OS abstraction that leverages the fact that TCP and UDP both have same-sized port fields that behave similarly. At the IP level, there’s nothing preventing me from inventing a new IP packet type (say EUDPX) that has 128-bit port numbers inside a fully encrypted IP payload; most routers and firewalls would just drop it to the floor because they have no idea what it is, but on a local LAN segment it would probably work just fine. The OS wouldn’t know anything about the port numbers, but it would know what the IP addresses were.
but if your application is making more than a handful of physical (socket) connections to a single destination, it’s usually a bug or design error.
just as with all things there are exceptions / mitigating circumstances. for example, for one of the projects where we are providing fixed-wireless internet access, we want to load-test large number (1024) of connections on a base-station / enode-b (for those familiar with mobile networks). as you can imagine, from practical p.o.v, it is not possible perform these kind of simulation in the real world with real devices etc.
so, we have a simple application on running on a x86 machine making 1024 connections to the real base-node, and each connection pretending to be a remote-node and running their control-plane state machines etc. etc.
Oh sure! Load tests are a different beast. It can be surprising, and entertaining, to learn all of the different things that can become bottlenecks. Available ports, file descriptors, the network stack itself, so many things go wrong before you get anywhere near saturating your CPU or exhausting your memory 🙃
IP is an OSI layer 3 (network) protocol, and IP connections are (indeed) identified by this 4-tuple.
TCP is an OSI layer 4 (transport) protocol, and TCP connections are (technically) identified by a 5-tuple: the 4-tuple from the IP, plus a protocol field, which is always gonna be TCP for, er, TCP.
That’s because you can, theoretically, mux different transport-layer protocols over a single network-layer identifier. For example, you can run a UDP/IP server and a TCP/IP server both on localhost:1234. This isn’t a huge deal in practice, because most server software will bind to a SOCK_STREAM socket by default, and most client software will assume a bare host:port should use SOCK_STREAM as well.
The example code demonstrates this:
The socket is constructed with AF_INET i.e. IP, and SOCK_STREAM i.e. TCP. So the protocol part is kind of baked-in. Calling bind or connect uses the host and port provided to the function, and takes the protocol from the existing socket.
—
Oof! Sockets and connections aren’t free, especially when they traverse the network (c.f. local Unix domain sockets). Bind-before-connect can be useful in very specific use cases, but if your application is making more than a handful of physical (socket) connections to a single destination, it’s usually a bug or design error.
I think you have the layers mixed up a bit conceptually. IP addresses are an IP concept. Ports are a concept that is shared by the two common transport protocols of TCP and UDP.
I hope I don’t have the concepts mixed up, I’m (supposed to be) a network engineer! 🙃 I also agree with the things you said. Can you point out what you think I got wrong?
https://en.wikipedia.org/wiki/Internet_Protocol_version_4#/media/File:IPv4_Packet-en.svg
At the IP layer, there are source and destination addresses, but ports don’t show up until you get to the TCP or UDP layer:
https://en.wikipedia.org/wiki/Transmission_Control_Protocol#TCP_segment_structure https://en.wikipedia.org/wiki/User_Datagram_Protocol#UDP_datagram_structure
Other protocols, e.g. ICMP (ping) don’t have source or destination ports at all: https://en.wikipedia.org/wiki/Internet_Control_Message_Protocol#Datagram_structure
You’re right, my previous comment was conflating concepts in a way that was confusing.
I think the only thing I really wanted to point out was the different-protocols-same-port bit.
Conceptually though the 4-tuple/5-tuple abstraction is still very useful in practice, especially given that the OSI model is generally… well, it’s a model too, and if I’m remembering my history correctly, it was actually built in parallel to the IPv4 stack that won. The 7-layer model was meant to be used to build a protocol stack that actually implemented all 7 layers, but TCP/IP “won” and OSI lived on in eternity as a model and not as an implementation.
Reasonable people may disagree, but I don’t think the OSI layer model and TCP/IP are, like, mutually exclusive to each other. There’s some squidgy-ness in the details: IP is pretty clearly an OSI layer 3 thing, and TCP is layer 4 and maybe 5 depending on how you look at it. So there’s no bijective mapping from the one to the other. But that’s fine: all models are wrong, some models are useful, and I think the OSI model continues to be useful :)
I don’t know how I missed this reply until today but I absolutely agree. It gets even trickier when you introduce TLS, but yeah, still a useful model!
I don’t think he mixed anything up. TCP connections are identified by the 5-tuple. The three fields that are part of IP are the source and destination (IP) address and the protocol. The source and destination port are part of the TCP or UDP header. As he says, this means that you can use the same port for both UDP and TCP, because they are separate namespaces (though, again, as he says) a lot of software somewhat conflates them. This conflation is even more likely with things like HTTP, which now run over either TLS+TCP or QUIC+UDP and so want to use the same port on the server for both TCP and UDP.
The trick I think is that the 5-tuple is an OS abstraction that leverages the fact that TCP and UDP both have same-sized port fields that behave similarly. At the IP level, there’s nothing preventing me from inventing a new IP packet type (say EUDPX) that has 128-bit port numbers inside a fully encrypted IP payload; most routers and firewalls would just drop it to the floor because they have no idea what it is, but on a local LAN segment it would probably work just fine. The OS wouldn’t know anything about the port numbers, but it would know what the IP addresses were.
just as with all things there are exceptions / mitigating circumstances. for example, for one of the projects where we are providing fixed-wireless internet access, we want to load-test large number (1024) of connections on a base-station / enode-b (for those familiar with mobile networks). as you can imagine, from practical p.o.v, it is not possible perform these kind of simulation in the real world with real devices etc.
so, we have a simple application on running on a x86 machine making 1024 connections to the real base-node, and each connection pretending to be a remote-node and running their control-plane state machines etc. etc.
Oh sure! Load tests are a different beast. It can be surprising, and entertaining, to learn all of the different things that can become bottlenecks. Available ports, file descriptors, the network stack itself, so many things go wrong before you get anywhere near saturating your CPU or exhausting your memory 🙃