Ah, the ‘crash or infinite loop’ trick for early debugging. So much pain when you need to do that.
It’s really fascinating that AWS decided to use the Xen PVH model for a KVM-based implementation, but interesting. In particular, Xen and KVM / Hyper-V / VMWare really came from opposite directions. Xen grew out of the Nemesis exokernel project and is really about providing an exokernel operating system that looks as much like bare-metal hardware as makes sense. In contrast, KVM, Hyper-V, and VMWare ESXi all grew out of a desire to run unmodified operating systems and so try to faithfully emulate a PC, then provide ways of poking through that emulation for performance if a guest knows that it’s in a VM. We’re now in the slightly odd position of having made a load of design decisions on the assumption that running an OS in a VM is a niche case and not wanting to disrupt the guest code, in a world where the overwhelming majority of deployments of server operating systems are in a VM. Even Android runs in a VM (with the Halfnium hypervisor) so my guess is that Linux running without a VM is a tiny niche case these days, yet the last 10-20 years of hardware and software decisions were all based around running Linux without it needing to know it was in a VM.
The FIFO interrupt bug is fun. I’ve been writing some code that talks to a 16550 recently and there’s absolutely no point in using the interrupt because a 20 MHz processor can’t do a useful amount of work in the time it takes the FIFO buffer to empty. On a 2 GHz system, I suspect that the calculation is very different. It’s a shame that they don’t provide the Xen PV console rather than an emulated UART. The Xen PV console interface is a simplified version of the normal Xen ring buffer that is very easy to work with so that you can start writing to it very early in the boot process. I’d much rather deal with it than a 16550.
I really enjoyed the ‘bubblesort is slow’ followed by ‘replacing it with something fast will shave 2ms’. For any use other than Firecracker, a 2ms increase in boot time is probably irrelevant, but when you can boot in 20ms, 2ms is a big overhead.
The page zeroing thing has bitten us in the past too. On a 100 MHz in-order prototype CPU with 1 GiB of RAM, it costs a lot. Rather than doing it lazily, for Firecracker you can probably skip it: the memory will be zeroed by the hypervisor before giving it to you. Zeroing it again in the guest just triggers CoW faults.
From what I can remember, AWS EC2 started out using Xen to run Linux VMs when it was first launched? This was before SVM and VT-x were commonplace in x86 chips. They were released in 2006 and 2005. EC2 was released in 2006.
As far as I’m aware, they still use a mix of Xen and KVM, which gives them some resilience in case there’s a critical bug in either. Not sure that they ever deployed them, but they also had some fun things to do VM migration between Xen and KVM, including compat layers for both hypervisors that emulated the other.
I thought I remembered people saying you really wanted a 16550 to keep your 386/early 486 running well when running a BBS under OS/2. If I understand your message right, it sounds like that’d have been pretty negligible on those machines?
The 16550 has a small buffer that lets you write a few characters before waiting for the, to flush. The UART it replaced didn’t, which limited performance a lot on faster systems. The interrupt when the queue had space is useful on a computer that can do a useful amount of work in between the buffer clearing. If it has a 16 byte buffer then it takes a bit over 4ms to flush the buffer at 28.8 Kb/s (common MODEM speed). A 33 MHz 486 can do almost 150K cycles of work there, so even with a penalty of a few thousand cycles for the interrupt it’s going to be much more efficient than polling, especially if you’re driving multiple MODEMs.
Ah, the ‘crash or infinite loop’ trick for early debugging. So much pain when you need to do that.
It’s really fascinating that AWS decided to use the Xen PVH model for a KVM-based implementation, but interesting. In particular, Xen and KVM / Hyper-V / VMWare really came from opposite directions. Xen grew out of the Nemesis exokernel project and is really about providing an exokernel operating system that looks as much like bare-metal hardware as makes sense. In contrast, KVM, Hyper-V, and VMWare ESXi all grew out of a desire to run unmodified operating systems and so try to faithfully emulate a PC, then provide ways of poking through that emulation for performance if a guest knows that it’s in a VM. We’re now in the slightly odd position of having made a load of design decisions on the assumption that running an OS in a VM is a niche case and not wanting to disrupt the guest code, in a world where the overwhelming majority of deployments of server operating systems are in a VM. Even Android runs in a VM (with the Halfnium hypervisor) so my guess is that Linux running without a VM is a tiny niche case these days, yet the last 10-20 years of hardware and software decisions were all based around running Linux without it needing to know it was in a VM.
The FIFO interrupt bug is fun. I’ve been writing some code that talks to a 16550 recently and there’s absolutely no point in using the interrupt because a 20 MHz processor can’t do a useful amount of work in the time it takes the FIFO buffer to empty. On a 2 GHz system, I suspect that the calculation is very different. It’s a shame that they don’t provide the Xen PV console rather than an emulated UART. The Xen PV console interface is a simplified version of the normal Xen ring buffer that is very easy to work with so that you can start writing to it very early in the boot process. I’d much rather deal with it than a 16550.
I really enjoyed the ‘bubblesort is slow’ followed by ‘replacing it with something fast will shave 2ms’. For any use other than Firecracker, a 2ms increase in boot time is probably irrelevant, but when you can boot in 20ms, 2ms is a big overhead.
The page zeroing thing has bitten us in the past too. On a 100 MHz in-order prototype CPU with 1 GiB of RAM, it costs a lot. Rather than doing it lazily, for Firecracker you can probably skip it: the memory will be zeroed by the hypervisor before giving it to you. Zeroing it again in the guest just triggers CoW faults.
From what I can remember, AWS EC2 started out using Xen to run Linux VMs when it was first launched? This was before SVM and VT-x were commonplace in x86 chips. They were released in 2006 and 2005. EC2 was released in 2006.
As far as I’m aware, they still use a mix of Xen and KVM, which gives them some resilience in case there’s a critical bug in either. Not sure that they ever deployed them, but they also had some fun things to do VM migration between Xen and KVM, including compat layers for both hypervisors that emulated the other.
I thought I remembered people saying you really wanted a 16550 to keep your 386/early 486 running well when running a BBS under OS/2. If I understand your message right, it sounds like that’d have been pretty negligible on those machines?
The 16550 has a small buffer that lets you write a few characters before waiting for the, to flush. The UART it replaced didn’t, which limited performance a lot on faster systems. The interrupt when the queue had space is useful on a computer that can do a useful amount of work in between the buffer clearing. If it has a 16 byte buffer then it takes a bit over 4ms to flush the buffer at 28.8 Kb/s (common MODEM speed). A 33 MHz 486 can do almost 150K cycles of work there, so even with a penalty of a few thousand cycles for the interrupt it’s going to be much more efficient than polling, especially if you’re driving multiple MODEMs.