The article makes some pretty good points but then it falls flat when he says “SQL: because math”. There are plenty of good reasons to like SQL but treating the presence of math like some kind of hocus pocus which automatically validates a solution is naive.
Math can be a blessing or a curse. In SQL it’s both because the math gives it the flexibility it has while also saddling programmers with a model which is an unfortunate mismatch for many programming tasks. The whole cursed field of ORM stems from this.
Some day we’ll have something better than SQL. I’d say that at this stage it’s hard to say what it will look like - it may have a mathematical basis or it may not. But it’ll solve the problems we actually have rather than just “because math”.
One interesting book we have in the office that covers this stuff at length is “Algorithms for Memory Hierarchies: Advanced Lectures.”
Other interesting points while considering architecture:
sequential disk access may be faster than random memory access with your hardware, depending on access patterns
you may be able to achieve ~20x intradatacenter round-trips in the time it takes to seek a rotational disk
MySQL never went out of style with many companies that have some of the most demanding workloads. The author mentions that many nosql stores involve high latency compared to traditional sql systems. But I think it’s pretty hard to try to have any sort of single access layer, such as the one he seems to advocate, that is able to legitimately address the demands of both low-latency and high-throughput workloads at various read/write ratios. Some organizations seem to be going the direction of doing as much processing in streaming systems as possible, which may reduce the need to support high-throughput systems in many cases. But the LSM tree exists for a number of reasons, and the use-case spectrum from b+ tree to Kafka is pretty wide.
That’s true, but “depending on access patterns” is an extremely sharp edge (at least with magnetic drives). If you send the next read to the drive just in time for the head to be in the right place, you can get very low latencies and high throughput. If you miss that window by even a few microseconds, you’re waiting 8ms for the platter to spin all the way around again. If another workload on the same spindle (e.g. another query) causes the scheduler to move the head, you’re waiting 10s of ms at least. If there’s another IO in the queue with the FUA bit set, you may be waiting 10s of ms, depending on the drive manufacturer.
This kind of thing makes benchmarking databases, especially databases on magnetic disks, very hard to do accurately.