Article shows how efficient LevelDB is compared to other stores. Even on a cheap mobile device it out performs SQLite and Microsoft’s very own ESENT. Article contains source-code to as well to try out the demo your self.
A benchmark of an LSM tree-based storage engine that doesn’t exercise compactions is probably not telling you the whole story; 1000 records does not seem very realistic. For the flash storage on phones, write amplification is another consideration. For some ideas on effectively benchmarking storage engines, it’s worth looking at lmdb’s benchmarks: http://symas.com/mdb/#bench
Awesome I will love to dig more. I also tried with higher entries (5K - 10K is good realistic figure for mobile apps) forcing log file to go above the size limit and kicking compaction. Results were still no bad at all. I have worked with LMDB in past and yes nothing compares to the speed of LMDB but we are compared to SQLite or any other B-Tree implementation LevelDB takes the cake almost every time.
Question: What is write amplification, and what should I know about it as a developer?
Write amplification is how much ends up actually being written over time when you want to write something. If you store 5mb of application data, but over time 15mb gets written to disk to satisfy this, your write amplification is 3. With LSM-tree backed db’s like leveldb, rocksdb, hbase, cassandra, etc… your data is periodically rewritten, which increases the IO needed to satisfy storing that data over its lifetime. Many db’s have a write-ahead log where data is placed before ending up in a read-optimized persistent structure (sequential writes are faster than random writes, so this is a shortcut you can take to getting something on disk) but you end up writing the data to multiple structures. If you’re not IO bound, you can use this to speed up fsyncs to the log while asynchronously populating the persistent structure without losing persistence guarantees.
So, it’s not always bad to amplify your writes, but certain systems are pretty egregious about it, and will end up significantly increasing the cost of ownership for safely and efficiently storing data.
As mxp points out, it can also refer to hardware characteristics where the underlying media will rewrite data over time.
Another interesting characteristic of storage systems is read amplification. How many reads need to be performed in order to satisfy a single request? LSM trees will have a higher read amplification than B+ trees because you may traverse several more structures to find the interesting data.
There’s also space amplification. This happens when there are redundant copies of your data hanging around.
Here’s some more info on these http://smalldatum.blogspot.com/2015/11/read-write-space-amplification-b-tree.html
These factors are of particular interest for people managing large-scale and/or high-performance storage systems. Some systems are better at handling high volumes of writes or reads than others, so it’s good to know where your target DB lies on the spectrum, and how its performance can dip over time due to storage management procedures like compacting multiple structures into one and rewriting them while deduplicating unneeded versions of data.
Fascinating! Thank you very much!
Could you elaborate a little more on space as opposed to write amplification?
This post explains it better than I can: http://smalldatum.blogspot.com/2015/11/read-write-space-amplification-pick-2_23.html
Write amplification (WA) is an undesirable phenomenon associated with flash memory and solid-state drives (SSDs) where the actual amount of physical information written is a multiple of the logical amount intended to be written.https://en.wikipedia.org/wiki/Write_amplification explains it pretty well.