1. 8

  2. 1

    Can someone explain like I’m five what eBPF is here?

    1. 3

      You can think of it like scripting for the Linux Kernel. It allows one to write small programs that are loaded into the kernel at runtime, and executed in kernel space. This is huge because previously the only way to do such a thing would be to either write a kernel module, or re-compile a custom kernel. eBPF allows similar functionality that can be loaded and unloaded on a live kernel.

      This works by writing in some language (typically a subset of C, but the linked repository above allows using Rust) which is compiled via LLVM to a BPF byte code, this byte code is then loaded into the kernel by a userspace program, the kernel then runs safety and verification checks against the code, and if all checks pass it is JITed and run natively.

      These programs can be loaded at different injection points depending on the desired use case. The various injection points provide different visibility of various kernel data structures. For example, some eBPF programs will be loaded super low in the network stack (as low as inside the NIC driver, or even on the NIC hardware itself using a sub-category of eBPF programs called XDP [eXpress Data Path]) to do processing/accounting/mangling of network packets, while others may be triggered by a particular kernel function or syscall and provide access to the functions arguments live, etc.

      In practice eBPF is done in a few steps:

      1. Write a small eBPF program that will run in kernel space. This “module” usually communicates with the outside world (i.e. userspace) by read/writing to “maps” (essentially special hash maps provided by the kernel)
      2. Write a userspace program that will load your eBPF program into the kernel
      3. Write a userspace program that will communicate with the eBPF program by either reading/writing to the same “map” and display those results to the user, or do processing that is too expensive to do in the kernel directly.

      2 and 3 can, and often are combined into the same program. Additionally, 2 will sometimes also compile the source for the eBPF program before loading into the kernel however using a new-ish technology called BTF (eBPF Type Format) it’s becoming possible to compile the eBPF program once, and simply load the BTF byte code into the kernel as a binary blob.

      There are two additional terms you’ll run across when looking into eBPF and those are BCC and bpftrace. It took me a while to wrap my head around what these are and how they fit into the eBPF picture.

      BCC - BPF Compiler Collection is essentially a a library and set of APIs to allow including BPF programs (i.e. C) into things like Python scripts with the necessary plumbing provided to load/unload these programs and communicate via the maps.

      bpftrace - is a tool that provides somewhat of a C like DSL for writing eBPF programs along with the being the actual program to compile/load/unload these programs. bpftrace makes it possible to write very short, even one line scripts that are focused on tracing/accounting kernel structures and functions.

      Both BCC and bpftrace provide a bunch of utilities that showcase what they can do, and also provide some quick and interesting insights into the kernel.

      1. 1

        What are the five-year-olds in your life like?! ;-) But seriously that was a very helpful explanation, thank you.

        I guess what remains unclear to me is what eBPF is good for. Is it just, like… everything you would use a kernel module for, but easier? I saw you mention tracing what the Linux kernel is doing, and maybe messing with internet packets. Is there a killer app for eBPF?

        1. 2

          Sorry, I’m known for being long winded ;-)

          Big uses I’ve seen are:

          • Tracing/Observability (seeing when an event occurs in the kernel, and the context around that event). I.e. you want to see each time bytes are written to a block device, and want to trace latencies, or what PIDs initiated those writes. The great part is not only seeing the events as they happen in the kernel, but also the exact arguments and context passed to those events.
          • Security. Some of the injection points not only allow reading data from events, but are also actively in the path of the data and thus can modify or even block events. For example, for watching exec/execve/fork events and dropping those that come from untrusted sources.
          • Network tracing/filtering/mangling. The network category is almost a whole thing unto itself. Common cases include accounting (i.e. see how many bytes are sent/received by event…such as by PID, source, port, socket, or anything else you want to categorize as), filtering (like iptables but at a lower level prior to the packet even entering the kernels network stack), or mangling such as data plan routing (re-writing packet information and re-routing to other NICs or machines, again lower than the kernels network stack). A key advantage of the network piece is if one can make decisions prior to the kernel’s network stack getting involved, you have significant performance wins because the kernel does a ton of processing/accounting on all packets so if you can cut that out you may see massive wins. XDP in particular is interesting because on supported NICs the BPF program can run on the NIC hardware itself, or even directly in the NIC driver so a socket buffer is not even ever allocated by the kernel/machine. Saving millions of allocations can make big differences in performance.

          Keep in mind you’re not limited to a single BPF program, you can have many working in concert all writing to various maps, and a userspace program taking in all this data and processing/presenting it in a way that is useful to the user. So while a single tracepoint might seem silly, seeing multiple related tracepoints and all the context associated with all of them can draw a big clear picture.

          Its difficult to say exactly all the use cases of eBPF because its such a wide open topic. Its kind of like asking what can one do with Python.

          I don’t know that there is a killer app for eBPF (yet). It’s still very new and only just now starting to really take off into what I would say the beginnings of mainstream (enthusiasts). Cilium is a major player which provides policy based container network monitoring/routing/security (to the best of my knowledge, I’m not super familiar with Cilium other than their generic eBPF/XDP documentation which is incredible and borderline must read material for anyone trying to get into eBPF). There are other sandboxing tools built on eBPF that are pretty great, but still in the proof of concept phase. Likewise there are tons of low level tools that use eBPF behind the scenes, but I don’t think that counts as a killer app.

          1. 1

            dtrace, better firewall (bpf is originally berkeley packet filter), performance (less kernel/user space ctx switching), …