I like this post, but part 2 is a little odd. Speculative around the specifics of the root cause and goes into detail on things unnecessary for understanding the outage (e.g. buffer overflows). Probably would have been better served linking to another explanation for those things (for the curious reader).
Not to mention that the explanation of buffer overflows and corresponding diagram is completely wrong:
A buffer overflow happens when a program writes data to a buffer, but goes beyond the memory allocated for the program. In doing so, it overwrites the adjacent locations of other programs. By taking advantage of a buffer overflow, it’s possible to alter the runtime of another program.
Processes are isolated from each other by the OS; they can’t overwrite each others’ memory. The buffer overflow writes into other memory allocations within the same process, which may be able to achieve code execution within that process.
Apropos of nothing I read a blog post the other day that talked about how the TTL field in the IP header is how long the DNS entry for that IP address is valid. I think it was written by ChatGPT or something.
Yes, but there are exploit techniques like return-to-libc and ROP that involve repurposing code that is already in the process and marked executable. They are in turn mitigated by ASLR making such addresses unpredictable. There are still sometimes ways to bypass these.
I was surprised a few years ago when I realized that all debian based systems have their cron.daily run at the same time. There is no randomization at all. I wonder if this is visibile in mirror logs around the world.
That’s very surprising. The FreeBSD-update tool has a random sleep in its cron mode to avoid all of the machines hitting the mirrors at the same time. I’m surprised the Debian servers don’t complain.
It is a virtual machine. It executes bytecode. It’s severely constrained so that programs written in it can’t loop infinitely or crash or do anything particularly bad. It’s used in Linux so that userspace programs can run code in kernel space. Because it’s constrained, in theory this isn’t as dangerous as having the userspace program e.g. hand over some arbitrary machine code to run in kernel space
The main thing it tends to get used for is filtering. e.g. in networking, you might have a use case like “I want to dump every UDP packet on port 53”. You write an eBPF program which looks at the right bytes in a buffer to distinguish that it’s UDP and the port number is 53, and upload it to the kernel. The kernel executes this program for every incoming packet and sends the packets on which it returns true to the userspace program.
Because the filter program runs in kernel space, you don’t have the overhead of having to either a) context switch to userspace to test each packet or b) send every packet to userspace and filter them there.
Because the filter is a program, it can express complex rules that would be hard to express with some kind of rules engine.
Because the filter is a program, it’s feasible to speed it up by converting it to machine code (after verifying that it is well formed and conforms to all the rules that forbid it from doing bad stuff.)
Not that all of the ‘it’s constrained’ is true only in the absence of speculative side channels in your architecture. You can use eBPF to leak any arbitrary data from the kernel by injecting spectre gadgets that you then trigger via packet data. You can also use the eBPF JIT to inject convenient gadgets (at a location that you can leak via aforementioned spectre gadgets) for a code reuse attack if you want to turn a data corruption vulnerability into an arbitrary code execution exploit.
Of all of the things that Linux ships that appear to be primarily designed to benefit attackers, eBPF is probably the one that they like the most.
Yeah it’s scary as all hell. “Userspace can upload programs to run in kernel space” is asking for problems in the same way as “random strangers will send you JavaScript code which you will then immediately execute with access to a huge number of APIs mostly implemented in C++” is. :)
Userspace (root) can upload code into the kernel by loading kernel modules. I wouldn’t mind if eBPF were locked down like this, but it’s used by Docker (because Linux doesn’t really have a notion of a zone / jail) and so needs to be accessible to userspace.
ZFS now lets root run channel programs, which is not very friendly to jails / zones. I wish both approaches would use the traditional UNIX model of exposing these in the filesystem. If ZFS channel programs could only be loaded by root, but were then exposed as /dev/zfs/progs/{name} and the ioctl that ran them took a file descriptor to the channel program to run then it would be possible for root to install an audited set of safe programs and expose them to unprivileged users / jails (for example, a backup program could run a channel program that allowed it to remove snapshots, but only snapshots that contained some specific metadata indicating that they were created by the backup program). Something similar could work with eBPF.
I like this post, but part 2 is a little odd. Speculative around the specifics of the root cause and goes into detail on things unnecessary for understanding the outage (e.g. buffer overflows). Probably would have been better served linking to another explanation for those things (for the curious reader).
Not to mention that the explanation of buffer overflows and corresponding diagram is completely wrong:
Processes are isolated from each other by the OS; they can’t overwrite each others’ memory. The buffer overflow writes into other memory allocations within the same process, which may be able to achieve code execution within that process.
Apropos of nothing I read a blog post the other day that talked about how the TTL field in the IP header is how long the DNS entry for that IP address is valid. I think it was written by ChatGPT or something.
Don’t most operating systems have W^X protection, unless explicitly disabled via
mprotect
?Yes, but there are exploit techniques like return-to-libc and ROP that involve repurposing code that is already in the process and marked executable. They are in turn mitigated by ASLR making such addresses unpredictable. There are still sometimes ways to bypass these.
Which every JIT compiler needs to ignore, so there is your starting point.
I found it way more informative than the postmortem published by DataDog: https://www.datadoghq.com/blog/2023-03-08-multiregion-infrastructure-connectivity-issue (this one is actually referenced by the above article).
OS update for thousands of VMs applied at the SAME TIME. Geez…what could go wrong huh ?
I was surprised a few years ago when I realized that all debian based systems have their cron.daily run at the same time. There is no randomization at all. I wonder if this is visibile in mirror logs around the world.
That’s very surprising. The FreeBSD-update tool has a random sleep in its cron mode to avoid all of the machines hitting the mirrors at the same time. I’m surprised the Debian servers don’t complain.
Every one of these high profile outages (it seems) is either a weird cloud configuration change, or a dependency upgrade.
Interesting food for thought.
I don’t know what eBPF is, but I feel like it’s in almost every one of these.
It’s a tool to allow unprivileged users to inject gadgets into the kernel to use in their privilege elevation exploit.
It’s a power feature for super magic kernel level packet routing, I think.
It is a virtual machine. It executes bytecode. It’s severely constrained so that programs written in it can’t loop infinitely or crash or do anything particularly bad. It’s used in Linux so that userspace programs can run code in kernel space. Because it’s constrained, in theory this isn’t as dangerous as having the userspace program e.g. hand over some arbitrary machine code to run in kernel space
The main thing it tends to get used for is filtering. e.g. in networking, you might have a use case like “I want to dump every UDP packet on port 53”. You write an eBPF program which looks at the right bytes in a buffer to distinguish that it’s UDP and the port number is 53, and upload it to the kernel. The kernel executes this program for every incoming packet and sends the packets on which it returns true to the userspace program.
Because the filter program runs in kernel space, you don’t have the overhead of having to either a) context switch to userspace to test each packet or b) send every packet to userspace and filter them there.
Because the filter is a program, it can express complex rules that would be hard to express with some kind of rules engine.
Because the filter is a program, it’s feasible to speed it up by converting it to machine code (after verifying that it is well formed and conforms to all the rules that forbid it from doing bad stuff.)
Not that all of the ‘it’s constrained’ is true only in the absence of speculative side channels in your architecture. You can use eBPF to leak any arbitrary data from the kernel by injecting spectre gadgets that you then trigger via packet data. You can also use the eBPF JIT to inject convenient gadgets (at a location that you can leak via aforementioned spectre gadgets) for a code reuse attack if you want to turn a data corruption vulnerability into an arbitrary code execution exploit.
Of all of the things that Linux ships that appear to be primarily designed to benefit attackers, eBPF is probably the one that they like the most.
Yeah it’s scary as all hell. “Userspace can upload programs to run in kernel space” is asking for problems in the same way as “random strangers will send you JavaScript code which you will then immediately execute with access to a huge number of APIs mostly implemented in C++” is. :)
Userspace (root) can upload code into the kernel by loading kernel modules. I wouldn’t mind if eBPF were locked down like this, but it’s used by Docker (because Linux doesn’t really have a notion of a zone / jail) and so needs to be accessible to userspace.
ZFS now lets root run channel programs, which is not very friendly to jails / zones. I wish both approaches would use the traditional UNIX model of exposing these in the filesystem. If ZFS channel programs could only be loaded by root, but were then exposed as
/dev/zfs/progs/{name}
and theioctl
that ran them took a file descriptor to the channel program to run then it would be possible for root to install an audited set of safe programs and expose them to unprivileged users / jails (for example, a backup program could run a channel program that allowed it to remove snapshots, but only snapshots that contained some specific metadata indicating that they were created by the backup program). Something similar could work with eBPF.When I talk to you here, I frequently get the impression that you don’t read the posts you’re responding to, or think about the context very much.
Eh, user namespaces and io_uring are up there for sure. But yeah ebpf is an awesome primitive.