There’s a lot of wisdom in that blog post. One thing I’d say about the mainstream
“engineering time vs. software performance” is that optimization is a skill that
can be trained like any other. If you always defer optimization work, you’ll never
get good at it, and the costs will always seem high. Shying away from something
makes you a perpetual junior in that domain.
Corollary: you need a dedicated machine
Sad, but true. And I’d even go as far as saying you need bare metal access if you want to take performance seriously.
Running on a hypervisor you don’t control (i.e. KVM-based cloud servers) will
still be suboptimal. You want as much direct hardware accesss as possible, so
you can reliably tune your boot config, device & power setup.
Running on a hypervisor you don’t control (i.e. KVM-based cloud servers) will still be suboptimal
It’s much worse for cloud VMs. When you rent a cloud VM, the cloud vendor promises a certain CPU, but there’s something that they don’t tell you: quite often they lie. The CPU that they claim is not what you get, it’s the worst that you’ll get. Newer CPUs have lower power consumption per core for the same or better performance, but most cloud users aren’t willing to pay more for faster performance (when your granularity is per VCPU + 1-2 GiBs of RAM, a 20% speedup per core is hard to sell) and stick with the older SKUs. The cloud vendor will have the hypervisor report an older CPU via CPUID and disable any features for your VM that are present on only the newer CPU. Most users don’t care. If they get a faster CPU for the same money, they’re not going to complain. If you’re benchmarking and launching the VM sometimes results in it running 20-40% faster then that’s very surprising.
Is there a good tool for randomly adding latency for disk accesses? There are lots of things to do this for networks so that you can test as if you were using a high-latency, high-jitter, or low-bandwidth link but I’ve not seen anything that does the same for disk access.
Windows inserts huge random delays on disk accesses while Windows Defender looks at files, rolls a die, and decides whether it thinks the file is malware today (last year, for a couple of days, it decided all of the MS Office executables were malware. I’m not sure if that was more or less accurate than normal) which made Thunderbird completely unusable with mbox and have random pauses with Maildir. I wouldn’t want to inflict Windows on developers for normal use though.
Linux has dm-delay and dm-flakey. Not really what you asked for, but they’re a similar shape at least. I don’t really recommend as such, they’re awkward to use and quite limited, but I have used them to solve real problems in my OpenZFS work.
I’ve thought about a more generic version that you could load transforms into, but I haven’t had any time to explore it. And I’m now thinking about what a hardware version would be like, hmm.
Interesting, those are both at the block layer. I was thinking it would be quite easy to provide a nullfs-like FUSE filesystem that would introduce delays, but a CUSE block device that you can run arbitrary filesystems on top of might also be interesting for things like simulating spinning rust on an SSD.
There’s a lot of wisdom in that blog post. One thing I’d say about the mainstream “engineering time vs. software performance” is that optimization is a skill that can be trained like any other. If you always defer optimization work, you’ll never get good at it, and the costs will always seem high. Shying away from something makes you a perpetual junior in that domain.
Sad, but true. And I’d even go as far as saying you need bare metal access if you want to take performance seriously. Running on a hypervisor you don’t control (i.e. KVM-based cloud servers) will still be suboptimal. You want as much direct hardware accesss as possible, so you can reliably tune your boot config, device & power setup.
It’s much worse for cloud VMs. When you rent a cloud VM, the cloud vendor promises a certain CPU, but there’s something that they don’t tell you: quite often they lie. The CPU that they claim is not what you get, it’s the worst that you’ll get. Newer CPUs have lower power consumption per core for the same or better performance, but most cloud users aren’t willing to pay more for faster performance (when your granularity is per VCPU + 1-2 GiBs of RAM, a 20% speedup per core is hard to sell) and stick with the older SKUs. The cloud vendor will have the hypervisor report an older CPU via CPUID and disable any features for your VM that are present on only the newer CPU. Most users don’t care. If they get a faster CPU for the same money, they’re not going to complain. If you’re benchmarking and launching the VM sometimes results in it running 20-40% faster then that’s very surprising.
Is there a good tool for randomly adding latency for disk accesses? There are lots of things to do this for networks so that you can test as if you were using a high-latency, high-jitter, or low-bandwidth link but I’ve not seen anything that does the same for disk access.
Windows inserts huge random delays on disk accesses while Windows Defender looks at files, rolls a die, and decides whether it thinks the file is malware today (last year, for a couple of days, it decided all of the MS Office executables were malware. I’m not sure if that was more or less accurate than normal) which made Thunderbird completely unusable with mbox and have random pauses with Maildir. I wouldn’t want to inflict Windows on developers for normal use though.
I haven’t used any of these myself, but https://jepsen.io/filesystem#prior-art lists some existing tools in that area; unreliablefs in particular looks like it can inject latency
Linux has dm-delay and dm-flakey. Not really what you asked for, but they’re a similar shape at least. I don’t really recommend as such, they’re awkward to use and quite limited, but I have used them to solve real problems in my OpenZFS work.
I’ve thought about a more generic version that you could load transforms into, but I haven’t had any time to explore it. And I’m now thinking about what a hardware version would be like, hmm.
Interesting, those are both at the block layer. I was thinking it would be quite easy to provide a nullfs-like FUSE filesystem that would introduce delays, but a CUSE block device that you can run arbitrary filesystems on top of might also be interesting for things like simulating spinning rust on an SSD.