1. 12
  1. 2

    So what does this do that iptables doesn’t do? Redirecting traffic to different ports is a pretty old capability, seems like this should just be a friendly API on iptables.

    1. 5

      The repository seems to be linked to CloudFlare’s Production ready eBPF, or how we fixed the BSD socket API from a few days ago. (I’ve missed the article, so I’ll have to read it.)

      BTW, not long ago they’ve also released a related article How to stop running out of ephemeral ports and start to love long-lived connections.

      I think both of them describe how CloudFlare uses the TCP/IP and UDP/IP stacks both for incoming (internet->server) and outgoing (server->internet) flows.

      1. 5

        Despite it not being a lot of code, what is now tubular has been in the works for quite a while! But indeed it doesn’t predate any of the iptables capabilities you may be referring to. The truth is, Cloudflare has been using those iptables features (though not Destination NAT AFAIK) to deliver new connections to listening sockets. But there are a lot of pain points when trying to achieve what’s needed for the variety of services that run on the servers.

        If you’d like to know more about the justification for the bpf_sk_lookup hook, and specifics as to why it’s more suitable than alternatives/previous attempts, this talk presented at Linux Plumber’s Conf on Programmable Socket Lookup explains in more detail.

        1. 2

          Thanks that talk looks quite interesting.

          1. 1

            I watched the talk and looked at the patches, but I’m not convinced, and it seemed like several netdev posters weren’t either. When did they get convinced that a TPROXY approach would not work?

            This seems like an abuse of BPF. There weren’t any fundamental issues with the TPROXY approach, nothing that couldn’t be solved. Making something fully programmable is good, but here that programmability is just replicating the existing APIs, which is bad - slower, less introspectable.

          2. 4

            OK, so I’ve read the mentioned article, Production ready eBPF, or how we fixed the BSD socket API, and it at first it does seem to be some sort of local DNAT.

            However I don’t think it’s equivalent at least for the following reasons:

            • scale – it seems CloudFlare wants to have many small services listening on many IP/port ranges; with iptables one has to create one rule for each such mapping which quickly gets out-of-hand;
            • transparency – with iptables DNAT the local process can’t get the original destination IP; the sockets API only returns the IP/port on which the socket was bound;