This article starts talking about something interesting but it looks as if the author hit publish when it was in an early design stage.
The idea that it’s talking about is having a type-1 hypervisor that sits below any kernel and exposes functionality for managing VMs into either one privileged guest or multiple guests. This is the model that Xen used from the start, with either Linux or NetBSD typically filling the role of the privileged guest. There was some work to allow domU to launch VMs by delegating resources assigned to it, but I’m not sure how far this went.
Windows also follows this model. Hyper-V implements a public spec and, in theory, Windows can use any hypervisor that provides the same interface. This is important because Windows actually relies on the hypervisor for some security functionality. The Hyper-V design has a notion of a virtual trust level, effectively giving a set of orthogonal privilege modes within a VM such that a VM can drop privilege for most of the kernel and retain strong isolation guarantees for the rest of it. Things like the credential manager run at a higher VTL so that a compromise in the kernel doesn’t automatically leak kernel-held secrets (though it does expose the APIs for accessing them, so an attacker may still be able to privilege elevate). Various other monitoring things live at this level.
Android with Halfnium has a similar model, where there’s a small hypervisor that is designed to allow components such as the credential store to be isolated from compromises of the Linux kernel.
The big difference between Xen and the other two is that Xen ships a scheduler in the hypervisor. Hyper-V and Halfnium both place their trusted guest in the TCB for availability, even when in modes that isolate it for confidentiality and integrity. This makes sense on Android because if Linux crashes then it doesn’t really matter if any of the other services work. It also makes sense in the Windows model (on the client, at least) for similar reasons.
The Arm Realms model is somewhat different. Arm talks about Realms as if they’re building a hardware confidential computing solution but they’re actually doing something a lot more sensible: providing hardware acceleration for a privilege-separated hypervisor. The RMM is an absolutely minimal hypervisor that provides guarantees about realm (guest) isolation (pages are either private to a realm, or shared and the realm is aware that they’re shared). The RMM is not responsible for allocating memory or scheduling. A hypervisor in EL2 (outside of the Realm World) must find pages to allocate to a realm, pass them to the RMM (at which point it loses any ability to access them), ask the RMM to create a realm with access to those pages, and add VCPUs to that realm, and so on. The EL2 hypervisor is trusted to schedule realms (it also has access to the reset line for the system, so there’s no way of removing it from the TCB for availability). If a realm needs to be paged out or migrated, the RMM is responsible for providing encrypted and integrity-protected copies of the pages to EL2-owned memory, but EL2 is responsible for swapping them out or transferring them to a new machine that can start the realm again. The idea is that the EL2 hypervisor will be feature-rich and large, but only the code in the RMM is able to violate realm-isolation guarantees and the RMM is small enough that it could be formally verified (I don’t know of any plans to do that yet).
Probably assumed too much familiarity with the area when writing it…
The big difference between Xen and the other two is that Xen ships a scheduler in the hypervisor. Hyper-V and Halfnium both place their trusted guest in the TCB for availability, even when in modes that isolate it for confidentiality and integrity. This makes sense on Android because if Linux crashes then it doesn’t really matter if any of the other services work. It also makes sense in the Windows model (on the client, at least) for similar reasons.
The root domain is still always required to be functional however, if only for VMM tasks and management.
However, omitted Hyper-V shielded VMs from the article because Microsoft didn’t implement them properly from the security perspective, resulting in relatively easy security guarantee breakage… :-/
Windows also follows this model. Hyper-V implements a public spec
The TLFS isn’t the most complete thing in the world sadly.
and, in theory, Windows can use any hypervisor that provides the same interface
In practice too. :-) (at least for a certain subset)
The Hyper-V design has a notion of a virtual trust level, effectively giving a set of orthogonal privilege modes within a VM such that a VM can drop privilege for most of the kernel and retain strong isolation guarantees for the rest of it.
VTL1 is an interesting design (with an equally cursed SecureKernel implementation that has its own downsides)… Apple has a hardware-assisted implementation of the concept with PPL (which uses the GXF lateral privilege level ISA extension) - https://blog.svenpeter.dev/posts/m1_sprr_gxf/.
In practice however, it fills the same role as an enclave in the way that Microsoft currently uses it.
Hafnium
Gunyah on the Qualcomm side is a practical implementation of the design, and is what is shipped today there. The downside of not having a lateral privilege level Realms-style of course is that you lose EL2 access for Linux on those Qualcomm platforms.
This article starts talking about something interesting but it looks as if the author hit publish when it was in an early design stage.
The idea that it’s talking about is having a type-1 hypervisor that sits below any kernel and exposes functionality for managing VMs into either one privileged guest or multiple guests. This is the model that Xen used from the start, with either Linux or NetBSD typically filling the role of the privileged guest. There was some work to allow domU to launch VMs by delegating resources assigned to it, but I’m not sure how far this went.
Windows also follows this model. Hyper-V implements a public spec and, in theory, Windows can use any hypervisor that provides the same interface. This is important because Windows actually relies on the hypervisor for some security functionality. The Hyper-V design has a notion of a virtual trust level, effectively giving a set of orthogonal privilege modes within a VM such that a VM can drop privilege for most of the kernel and retain strong isolation guarantees for the rest of it. Things like the credential manager run at a higher VTL so that a compromise in the kernel doesn’t automatically leak kernel-held secrets (though it does expose the APIs for accessing them, so an attacker may still be able to privilege elevate). Various other monitoring things live at this level.
Android with Halfnium has a similar model, where there’s a small hypervisor that is designed to allow components such as the credential store to be isolated from compromises of the Linux kernel.
The big difference between Xen and the other two is that Xen ships a scheduler in the hypervisor. Hyper-V and Halfnium both place their trusted guest in the TCB for availability, even when in modes that isolate it for confidentiality and integrity. This makes sense on Android because if Linux crashes then it doesn’t really matter if any of the other services work. It also makes sense in the Windows model (on the client, at least) for similar reasons.
The Arm Realms model is somewhat different. Arm talks about Realms as if they’re building a hardware confidential computing solution but they’re actually doing something a lot more sensible: providing hardware acceleration for a privilege-separated hypervisor. The RMM is an absolutely minimal hypervisor that provides guarantees about realm (guest) isolation (pages are either private to a realm, or shared and the realm is aware that they’re shared). The RMM is not responsible for allocating memory or scheduling. A hypervisor in EL2 (outside of the Realm World) must find pages to allocate to a realm, pass them to the RMM (at which point it loses any ability to access them), ask the RMM to create a realm with access to those pages, and add VCPUs to that realm, and so on. The EL2 hypervisor is trusted to schedule realms (it also has access to the reset line for the system, so there’s no way of removing it from the TCB for availability). If a realm needs to be paged out or migrated, the RMM is responsible for providing encrypted and integrity-protected copies of the pages to EL2-owned memory, but EL2 is responsible for swapping them out or transferring them to a new machine that can start the realm again. The idea is that the EL2 hypervisor will be feature-rich and large, but only the code in the RMM is able to violate realm-isolation guarantees and the RMM is small enough that it could be formally verified (I don’t know of any plans to do that yet).
Probably assumed too much familiarity with the area when writing it…
For Hyper-V actually both modes are available (https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/manage/manage-hyper-v-scheduler-types). On client Hyper-V, scheduling is delegated to the root domain. On server Hyper-V, the hypervisor has its own scheduler.
The root domain is still always required to be functional however, if only for VMM tasks and management.
However, omitted Hyper-V shielded VMs from the article because Microsoft didn’t implement them properly from the security perspective, resulting in relatively easy security guarantee breakage… :-/
The TLFS isn’t the most complete thing in the world sadly.
In practice too. :-) (at least for a certain subset)
VTL1 is an interesting design (with an equally cursed SecureKernel implementation that has its own downsides)… Apple has a hardware-assisted implementation of the concept with PPL (which uses the GXF lateral privilege level ISA extension) - https://blog.svenpeter.dev/posts/m1_sprr_gxf/.
In practice however, it fills the same role as an enclave in the way that Microsoft currently uses it.
Gunyah on the Qualcomm side is a practical implementation of the design, and is what is shipped today there. The downside of not having a lateral privilege level Realms-style of course is that you lose EL2 access for Linux on those Qualcomm platforms.