I skimmed the paper, but this matches closely with my own intuition on the performance implications of NVMe: random access is no longer as expensive as it once was. LSMs work really well for spinning rust hard drives because their read and write patterns are mostly linear in large blocks. Meanwhile flash has a different set of problems, mostly around wear leveling and wearing out blocks (this is dramatically better today than it once was). It lead me down the path of implementing a persistent hashmap based on extendible hashing.. This paper has a more reasonable approach to a similar idea
I’m not that familiar with NVMe. Don’t all SSDs have fairly cheap random access, since there’s no mechanical head to move around? Would an approach like TreeLine’s work well on consumer devices like laptops and phones?
NVMe stands for Non-Volatile Memory express and is really just a communication standard for high bandwidth storage devices over PCIe. It’s faster than SSDs over SATA interfaces (typically).
Don’t all SSDs have fairly cheap random access, since there’s no mechanical head to move around?
Yes, much cheaper than mechanical disks, but fully random access is still slower than linear access (I’m not entirely sure on the reasons, my guess is the controller is able to more reliably predict where it needs to start fetch blocks from), even on high performance NVMe drives. Phones usually don’t use NVMe, but do use flash over some other interface such as UFS, which should have similar performance characteristics. Laptops, definitely. Most new mid-tier laptops and up today will have an NVMe drive as their primary storage device, though not all. TreeLine’s work could be a good high-performance option for systems like that, but it’s also not really what that type of system aims for. Usually when you’re talking about storage engines and comparing to LSM trees you’re talking about very high performance key-value systems (or more complex databases built on top of them) for large-scale systems. Most consumer systems won’t have much of a need for IO speeds that a project like this would offer, but it could result in a performance gain in theory. There might be some other tradeoffs with a system like this that might make it infeasible for consumer devices, such as increased RAM requirements or something. Again, I haven’t read the whole paper, so maybe not.
I skimmed the paper, but this matches closely with my own intuition on the performance implications of NVMe: random access is no longer as expensive as it once was. LSMs work really well for spinning rust hard drives because their read and write patterns are mostly linear in large blocks. Meanwhile flash has a different set of problems, mostly around wear leveling and wearing out blocks (this is dramatically better today than it once was). It lead me down the path of implementing a persistent hashmap based on extendible hashing.. This paper has a more reasonable approach to a similar idea
I’m not that familiar with NVMe. Don’t all SSDs have fairly cheap random access, since there’s no mechanical head to move around? Would an approach like TreeLine’s work well on consumer devices like laptops and phones?
NVMe stands for Non-Volatile Memory express and is really just a communication standard for high bandwidth storage devices over PCIe. It’s faster than SSDs over SATA interfaces (typically).
Yes, much cheaper than mechanical disks, but fully random access is still slower than linear access (I’m not entirely sure on the reasons, my guess is the controller is able to more reliably predict where it needs to start fetch blocks from), even on high performance NVMe drives. Phones usually don’t use NVMe, but do use flash over some other interface such as UFS, which should have similar performance characteristics. Laptops, definitely. Most new mid-tier laptops and up today will have an NVMe drive as their primary storage device, though not all. TreeLine’s work could be a good high-performance option for systems like that, but it’s also not really what that type of system aims for. Usually when you’re talking about storage engines and comparing to LSM trees you’re talking about very high performance key-value systems (or more complex databases built on top of them) for large-scale systems. Most consumer systems won’t have much of a need for IO speeds that a project like this would offer, but it could result in a performance gain in theory. There might be some other tradeoffs with a system like this that might make it infeasible for consumer devices, such as increased RAM requirements or something. Again, I haven’t read the whole paper, so maybe not.