One thing I would tweak is that dropped packets don’t just happen because “the network is down” or “something went wrong”. A certain percentage of dropped packets is a perfectly normal response to fluctuations in bandwidth demand along the route. (ECN makes this a little bit less true than it used to be, but dropped packets are still a normal part of the control loop.)
And that makes meltdown even more tragic: first, because nothing exceptional has to happen to trigger it; and second, because if the cause is network congestion, all of those unnecessary inner retransmits can only help make the congestion worse.
This may be a good opportunity to note udp2raw: it takes UDP packets and tunnels them over a fake TCP connection. This connection appears as TCP to firewalls, but doesn’t actually order packets internally, avoiding the issue entirely.
edit: And there it is, right in the article! Teach me to reply before reading it completely. Well, I can at least confirm it works well.
Haha no worries! One thing I did forget to mention in my post is Cloak, which is almost like a “udp2tls” whereby it emulates a full TLS session with known browser fingerprints. Quite effective against sophisticated (read: state actor) censorship techniques.
Are there any QUIC VPN solutions? It seems like QUIC would be a great fit for tunnelling multiple TCP flows without introducing additional head of line blocking.
I believe iCloud Relay uses QUIC? Although I think by being based on QUIC it’d have to act as a Layer-4 VPN/proxy (TCP and UDP only), but I haven’t looked into it that closely.
I once read that Wi-Fi also does retransmissions (layer 2 retransmissions [1]) and that these retransmissions also don’t mesh well with TCP retransmissions due to the exact same problem. Does anyone here know more about this? It’s hard to find literature on this subject, despite it being at the heart of a lot of modern technology.
I haven’t looked in detail, but my understanding was that faster speeds have caused a transition from retransmit to forward error correction as the dominant mechanism for handling dropped frames in newer link level protocols. The round trip time doesn’t drop as the speeds increase (at least, not by much), but every doubling in speed doubles the amount of data that you need to buffer while you wait for a retransmit, which can be catastrophic at high speeds. You can burn a little bit of bandwidth to include more error correction so that every n frames that you receive let you recover a dropped one. If you pick the right error correction rate, you never need to retransmit. if you have 1% packet loss and provide sufficient error correction to recover one packet every 200 then your worst-case latency from packet loss is 200 packets. If you need to retransmit then it’s however many packets you can send in one RTT, which can be a lot more on a high-bandwidth or high latency network. I vaguely remember being told that modern protocols use retransmit notifications to bump up the amount of error correction information that they add so that high packet loss gradually degrades available bandwidth but does not cause latency spikes.
One thing I would tweak is that dropped packets don’t just happen because “the network is down” or “something went wrong”. A certain percentage of dropped packets is a perfectly normal response to fluctuations in bandwidth demand along the route. (ECN makes this a little bit less true than it used to be, but dropped packets are still a normal part of the control loop.)
And that makes meltdown even more tragic: first, because nothing exceptional has to happen to trigger it; and second, because if the cause is network congestion, all of those unnecessary inner retransmits can only help make the congestion worse.
You’re absolutely right. I used network outage as an example just cuz it’s the most obvious cause. I’ll add a footnote. Thanks!
Thank you too. Great article.
This may be a good opportunity to note udp2raw: it takes UDP packets and tunnels them over a fake TCP connection. This connection appears as TCP to firewalls, but doesn’t actually order packets internally, avoiding the issue entirely.edit: And there it is, right in the article! Teach me to reply before reading it completely. Well, I can at least confirm it works well.
Haha no worries! One thing I did forget to mention in my post is Cloak, which is almost like a “
udp2tls
” whereby it emulates a full TLS session with known browser fingerprints. Quite effective against sophisticated (read: state actor) censorship techniques.Are there any QUIC VPN solutions? It seems like QUIC would be a great fit for tunnelling multiple TCP flows without introducing additional head of line blocking.
I believe iCloud Relay uses QUIC? Although I think by being based on QUIC it’d have to act as a Layer-4 VPN/proxy (TCP and UDP only), but I haven’t looked into it that closely.
I once read that Wi-Fi also does retransmissions (layer 2 retransmissions [1]) and that these retransmissions also don’t mesh well with TCP retransmissions due to the exact same problem. Does anyone here know more about this? It’s hard to find literature on this subject, despite it being at the heart of a lot of modern technology.
[1] https://extremeportal.force.com/ExtrArticleDetail?an=000095772
I haven’t looked in detail, but my understanding was that faster speeds have caused a transition from retransmit to forward error correction as the dominant mechanism for handling dropped frames in newer link level protocols. The round trip time doesn’t drop as the speeds increase (at least, not by much), but every doubling in speed doubles the amount of data that you need to buffer while you wait for a retransmit, which can be catastrophic at high speeds. You can burn a little bit of bandwidth to include more error correction so that every n frames that you receive let you recover a dropped one. If you pick the right error correction rate, you never need to retransmit. if you have 1% packet loss and provide sufficient error correction to recover one packet every 200 then your worst-case latency from packet loss is 200 packets. If you need to retransmit then it’s however many packets you can send in one RTT, which can be a lot more on a high-bandwidth or high latency network. I vaguely remember being told that modern protocols use retransmit notifications to bump up the amount of error correction information that they add so that high packet loss gradually degrades available bandwidth but does not cause latency spikes.