It’s definitely on in 14, I think it’s on in 13.1. It’s of very limited security value because it’s almost always trivial to leak an address that lets you bypass ASLR. For almost 10 years, ASLR has been no more than a small speed bump for most attacks, rather than an effective mitigation.
For almost 10 years, ASLR has been no more than a small speed bump for most attacks, rather than an effective mitigation.
How do you know this? For most published attacks, obviously it’s only a small speed bump. But how do you know how many potential attacks have been thwarted because they needed an address leak but couldn’t find one? Those would usually not be published.
Because automated exploit toolkits have had ASLR bypasses for ages. You don’t look at attacks, you look at vulnerabilities and determine whether they can be exploited with or without ASLR. In very few cases, ASLR prevents exploiting a vulnerability.
You don’t look at attacks, you look at vulnerabilities and determine whether they can be exploited with or without ASLR.
Yes, I definitely agree with this.
In very few cases, ASLR prevents exploiting a vulnerability.
This is the bit I’m questioning. I’m not saying it isn’t true: I really don’t know. I’m asking how you find evidence for this.
You need a representative set of vulnerabilities, which you can then check to see which ones are prevented by ASLR and which aren’t. Where do you find such a representative set?
Any vulnerability that does not lead to a successful attack is presumably much less likely to be published. This implies that vulnerabilities that are mitigated by ASLR are underreported. How do you deal with this bias in the data?
I also wonder if different types of vulnerability are more or less likely to be mitigated by ASLR. For example, are address leaks much more common for LPE vulns than RCE?
No vulnerabilities are completely prevented by ASLR, an attacker can always be incredibly lucky and guess the right value, so they are still reported and have CVEs assigned, they are just sometime downgraded in severity because an attack is hard. Often the CVEs are later upgraded when someone releases a PoC that combines an ASLR bypass with it but these says it’s uncommon to do the downgrade in the first place because an attacker is assumed to be able to bypass ASLR.
No, I’m assuming that reported vulnerabilities are statistically representative of all vulnerabilities. That may not be true, but it’s the closest approximation that we have.
That may not be true, but it’s the closest approximation that we have.
It’s fair enough to work with the only data that you have available - but that shouldn’t lead to emphatically definite statements such as “For almost 10 years, ASLR has been no more than a small speed bump for most attacks, rather than an effective mitigation.”
It sounds more like it should be “We don’t have any good evidence that ASLR actually helps much”.
I’m assuming that reported vulnerabilities are statistically representative of all vulnerabilities.
Similarly, I’m pretty sure that assumption is quite wrong… but I don’t have any good evidence to prove it :)
I still appreciate how useful SmartOS’ trick of running KVM inside a native zone is. Break out of KVM and end up in … an entirely segmented bit of the computer with still no access to the hypervisor.
This is more or less what the Caspicum support does. A process in capability mode has no access to any global namespaces. It cannot open files, sockets, named pipes, and so on, it can use only file descriptors that it starts with or which it is given by another process over an existing socket, escaping from the VM lets you access this process, but this process should be limited to only being able to access things that you have access to anyway, such as the block device or file that is exposed as a block device to the guest. The VM boundary real security boundary, it’s an abstraction boundary, the security boundary is the Capsicum-isolated process. The threat model for bhyve should assume that an attacker from the VM has full control over this process.
Exactly. This introduces a great advantage of sandboxing. Once exploiting a g2h vulnerability and escaping the VM, the attacker usually wishes to exploit another vulnerability to escalate privileges from the context they now got.
Different hypervisors have different architectures and models. For example, in Hyper-V, most of the functionality exposed to guests isn’t in the hypervisor, it’s in either one of two contexts:
(1) the VSPs (Virtualization Service Providers) - which run in the host’s kernel (vmbus.sys, vmswitch.sys, storvsp.sys etc.)
(2) the emulator, which runs in user-mode (called the Worker Process, which is spawned per guest)
In this model, the vulnerability you have determined in which context you get into – (user-mode / kernel-mode). That’s why it’s in everyone’s interest to move from kernel-mode to user-mode, and introduce more mitigations/sandboxing.
Getting back to Capsicum, the great benefit is that the capability-based model highly helps to restrict the attack surface exposed to the process. This is effectively attack surface reduction, and it’s proven to be very beneficial in many workloads.
There are basically two sandboxing approaches on FreeBSD: Jails and Capsicum.
Jails are very similar to Solaris Zones. Jails came first but lacked some important things (such as isolating the SysV IPC namespace), Zones arrived shortly afterwards and provided a complete implementation, I believe jails are now a superset of Zones functionality because they also allow running a completely separate instance of the network stack per jail, but they’re roughly equivalent. They give you isolated UID namespaces, support ZFS delegated administration, and so on. They’re good when you want to isolate a complicated program (which may involve multiple processes) without modifying the program and are used in the containerd shims for FreeBSD for containers.
Capsicum is a much more principled approach. Processes opt into it by issuing a cap_enter system call. This sets a flag in the process structure and also replaces its system call table with one that is significantly reduced: only system calls from an allow list are permitted. This turns the UNIX model that all accesses to resources outside of a process must go via a file descriptor into a proper capability model. All system calls that authorise accessing a resource without receiving a file descriptor are removed (no open, but openat is still allowed, for example). In addition to the restrictions, file descriptors grow a rich set of permissions. You can create a file descriptor that can be mmaped, but only with read-only permission, or a directory descriptor that allows you to open files for appending but not for arbitrary writing. You can also use this mechanism to authorise a subset of ioctls on a file descriptor that refers to a device node.
Capsicum is far more fine-grained than jails / zones, but requires source-code modification. If you are willing to modify the code, it gives you the guarantees that you actually want from sandboxing: no access to anything unless you explicitly enable access, no privilege elevation without requesting a new descriptor from a more privileged entity.
Fortunately, Capsicum is enabled by default in FreeBSD.
Any reasoning for this nowadays? Or is this not accurate?
It’s definitely on in 14, I think it’s on in 13.1. It’s of very limited security value because it’s almost always trivial to leak an address that lets you bypass ASLR. For almost 10 years, ASLR has been no more than a small speed bump for most attacks, rather than an effective mitigation.
How do you know this? For most published attacks, obviously it’s only a small speed bump. But how do you know how many potential attacks have been thwarted because they needed an address leak but couldn’t find one? Those would usually not be published.
Because automated exploit toolkits have had ASLR bypasses for ages. You don’t look at attacks, you look at vulnerabilities and determine whether they can be exploited with or without ASLR. In very few cases, ASLR prevents exploiting a vulnerability.
Yes, I definitely agree with this.
This is the bit I’m questioning. I’m not saying it isn’t true: I really don’t know. I’m asking how you find evidence for this.
You need a representative set of vulnerabilities, which you can then check to see which ones are prevented by ASLR and which aren’t. Where do you find such a representative set?
Any vulnerability that does not lead to a successful attack is presumably much less likely to be published. This implies that vulnerabilities that are mitigated by ASLR are underreported. How do you deal with this bias in the data?
I also wonder if different types of vulnerability are more or less likely to be mitigated by ASLR. For example, are address leaks much more common for LPE vulns than RCE?
No vulnerabilities are completely prevented by ASLR, an attacker can always be incredibly lucky and guess the right value, so they are still reported and have CVEs assigned, they are just sometime downgraded in severity because an attack is hard. Often the CVEs are later upgraded when someone releases a PoC that combines an ASLR bypass with it but these says it’s uncommon to do the downgrade in the first place because an attacker is assumed to be able to bypass ASLR.
So you’re assuming that all vulnerabilities are reported? That seems… optimistic.
No, I’m assuming that reported vulnerabilities are statistically representative of all vulnerabilities. That may not be true, but it’s the closest approximation that we have.
It’s fair enough to work with the only data that you have available - but that shouldn’t lead to emphatically definite statements such as “For almost 10 years, ASLR has been no more than a small speed bump for most attacks, rather than an effective mitigation.”
It sounds more like it should be “We don’t have any good evidence that ASLR actually helps much”.
Similarly, I’m pretty sure that assumption is quite wrong… but I don’t have any good evidence to prove it :)
I still appreciate how useful SmartOS’ trick of running KVM inside a native zone is. Break out of KVM and end up in … an entirely segmented bit of the computer with still no access to the hypervisor.
This is more or less what the Caspicum support does. A process in capability mode has no access to any global namespaces. It cannot open files, sockets, named pipes, and so on, it can use only file descriptors that it starts with or which it is given by another process over an existing socket, escaping from the VM lets you access this process, but this process should be limited to only being able to access things that you have access to anyway, such as the block device or file that is exposed as a block device to the guest. The VM boundary real security boundary, it’s an abstraction boundary, the security boundary is the Capsicum-isolated process. The threat model for bhyve should assume that an attacker from the VM has full control over this process.
Exactly. This introduces a great advantage of sandboxing. Once exploiting a g2h vulnerability and escaping the VM, the attacker usually wishes to exploit another vulnerability to escalate privileges from the context they now got.
Different hypervisors have different architectures and models. For example, in Hyper-V, most of the functionality exposed to guests isn’t in the hypervisor, it’s in either one of two contexts: (1) the VSPs (Virtualization Service Providers) - which run in the host’s kernel (vmbus.sys, vmswitch.sys, storvsp.sys etc.) (2) the emulator, which runs in user-mode (called the Worker Process, which is spawned per guest)
In this model, the vulnerability you have determined in which context you get into – (user-mode / kernel-mode). That’s why it’s in everyone’s interest to move from kernel-mode to user-mode, and introduce more mitigations/sandboxing.
Getting back to Capsicum, the great benefit is that the capability-based model highly helps to restrict the attack surface exposed to the process. This is effectively attack surface reduction, and it’s proven to be very beneficial in many workloads.
Oh nice, I’m not so au fait with FreeBSD and Capsicum. Good to hear!
There are basically two sandboxing approaches on FreeBSD: Jails and Capsicum.
Jails are very similar to Solaris Zones. Jails came first but lacked some important things (such as isolating the SysV IPC namespace), Zones arrived shortly afterwards and provided a complete implementation, I believe jails are now a superset of Zones functionality because they also allow running a completely separate instance of the network stack per jail, but they’re roughly equivalent. They give you isolated UID namespaces, support ZFS delegated administration, and so on. They’re good when you want to isolate a complicated program (which may involve multiple processes) without modifying the program and are used in the containerd shims for FreeBSD for containers.
Capsicum is a much more principled approach. Processes opt into it by issuing a
cap_enter
system call. This sets a flag in the process structure and also replaces its system call table with one that is significantly reduced: only system calls from an allow list are permitted. This turns the UNIX model that all accesses to resources outside of a process must go via a file descriptor into a proper capability model. All system calls that authorise accessing a resource without receiving a file descriptor are removed (noopen
, butopenat
is still allowed, for example). In addition to the restrictions, file descriptors grow a rich set of permissions. You can create a file descriptor that can be mmaped, but only with read-only permission, or a directory descriptor that allows you to open files for appending but not for arbitrary writing. You can also use this mechanism to authorise a subset of ioctls on a file descriptor that refers to a device node.Capsicum is far more fine-grained than jails / zones, but requires source-code modification. If you are willing to modify the code, it gives you the guarantees that you actually want from sandboxing: no access to anything unless you explicitly enable access, no privilege elevation without requesting a new descriptor from a more privileged entity.
Thanks for the detailed overview!