https://www.usenix.org/conference/srecon15europe/program/presentation/hoffman was a really interesting talk at SREcon Europe. Ericsson have been working on replacing the electrical buses between components with an optical backplane, and disaggregating everything.
This is really neat, and challenges many of my assumptions about designing software systems. RDMA over Ethernet (linked from the article) seems like an intriguing thing to study next (what happens when an RDMA packet is dropped?).
The author’s point about disaggregation being most economical for bandwidth users at the scale of Netflix or YouTube is also really intriguing - “effectively infinite disk” for consumers out of the box seems like it could benefit from integration from the operating system level all the way to the datacenter (maybe Microsoft or Apple would do this?). If YouTube or Netflix wanted to provide this service as a third party, what protocol would make the most sense? Just presenting as a faster Dropbox might be fine - you’d have to work around some assumptions of the operating system about capacity, but that’s probably not hard.
This article is good and I don’t have much to add to most of it, but two comments on small parts:
The round-trip [network] latency is so much lower than the seek time of a disk that we can disaggregate storage and distribute it anywhere in the datacenter without noticeable performance degradation, giving applications the appearance of having infinite disk space without any appreciable change in performance. This fact was behind the rise of distributed filesystems within the datacenter ten years ago.
I’d date some of this to the ‘90s. Disaggregated storage didn’t catch on widely then (though IBM’s AIX introduced support for a global filesystem, GPFS, in 1998), but simpler network-attached storage schemes like NFS did get widespread deployment, using the same observation that the network adds a small enough penalty to storage speed that it’s viable to do disk over the network.
One ‘90s proposal based on a similar observation that I found kind of clever, but which never got significant deployment, is to notice that remote RAM is faster than local disk, and therefore disaggregated RAM across a workstation cluster, though slower than local RAM, might be a viable component of the virtual memory hierarchy. For example, you could swap-to-network instead of swap-to-disk. (1995 paper; 2003 Linux module).
The most common counter argument to disaggregated disk, both inside and outside of the datacenter, is bandwidth costs. […] The situation outside the datacenter hasn’t evolved quite as quickly, but even so, I’m paying $60/month for 100Mb, and the trend of the last two decades continues, we should see another 50x increase in bandwidth per dollar over the next decade. It’s not clear if the cost structure makes cloud-provided disaggregated disk for consumers viable today, but the current trends of implacably decreasing bandwidth cost mean that it’s inevitable within the next five years.
I’m less optimistic on this, though my uncertainty on that is high. For consumer pricing in a number of countries, you can’t only look at the peak bandwidth of the connection (50 Mbps or 100 Mbps or whatever), but also the data-usage pricing. For example, Comcast in much of the U.S. caps usage in the normal pricing tier to 300 GB/month. That’s means that you can burst to 100 Mbps, but can’t average more than ~1 Mbps. Data caps (or per-GB pricing) are also common in Canada and Australia.
On SSD disaggregation in the datacenter, the one thing that I keep thinking is that the existing model just isn’t broken for some common database-ish loads. A Flash array will give you hundreds of thousands of IOPS. On the consumer side, you’ll apparently get 270K random read ops/s out of a $200 m.2 stick soon. If you’re using 100ks of IOPS, you can probably use a chunk of CPU and RAM as well.
And as a transport, the system bus is low-latency, low-jitter, reliable, and so on, even more than those networks he describes (which sound better than the networks I get to use, haha!). Huge enterprises have to run distributed databases anyway, so lots of database nodes that include their own disk seems like a reasonable model.
On the other hand, clearly tons of applications are using networked SSDs right now (Amazon’s gp2 disks, Google’s SSD persistent disks, etc.), so the market is saying they have a niche. And there is always value to being able to call up a resource elastically if you can execute it well enough.
I’m just saying that the old model still seems to have a fair amount of juice in it in high-end database-ish use cases.
Love the Luu-grams for the research, dedication to empirical grounding, and breadth; keep ‘em coming.