1. 11

  2. 20

    Is Spinning Disk Going Extinct?

    15K RPM drives (and probably laptop hard drives) are getting killed off ruthlessly by flash. Nobody will mourn 15K drives but I confess that the end of laptop hard drives will make me sad and nostalgic.

    Spinning rust has a dim future outside the data centre, but when $/GB matters, it reigns supreme, and that’s why in 2017 the state of the art still involves literal tubs of hard drives: https://code.facebook.com/posts/1869788206569924/introducing-bryce-canyon-our-next-generation-storage-platform/

    Google presented a paper at FAST16 about the possibility of fundamentally redesigning hard drives to specifically target hard drives that are exclusively operated as part of a very large collection of disks (where individual errors are not such a big deal as in other applications) – in order to even further reduce $/GB and increase IOPS: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44830.pdf .

    Possible changes mentioned in the paper include:

    • New (non-compatible) physical form factor[s] in order to freely change the dimensions of the heads and platters

    • adding another actuator arm / voice coil with its own set of heads

    • accepting higher error rates and “flexible” (this is a euphemism for “degrades over time”) capacity in exchange for higher areal density, lower cost, and better latencies

    • exposing more lower-level details of the spinning rust to the host, such as host-managed retries and exposing APIs that let the host control when the drive schedules its internal management tasks

    • better profiling data (time spent seeking, time spent waiting for the disk to spin, time spent reading and processing data) for reads/writes

    • Caching improvements, such as ability to mark data as not to be cached (for streaming reads) or using PCIe to use the host’s memory for more cache

    • Read-ahead or read-behind once the head is settled costs nothing (there’s no seek involved!). If the host could annotate its read commands with its optional desires for nearby blocks, the hard drive could do some free read-ahead (if it was possible without delaying other queued commands).

    • better management of queuing – there’s a lot more detail on page 15 of that PDF about queuing/prioritisation/reordering, including the need for the drive’s command scheduler to be hard real-time and be aware of the current positioning of the heads and of the media. Fun stuff! I sorta wish I could be involved in making this sort of thing happen.

    tl;dr there is a lot of room for improvement if you’re willing to throw tradition to the wind and focus on the single application (very large scale bulk storage) where spinning rust won’t get killed off by flash in a decade.

    http://www.ewh.ieee.org/r6/scv/mag/MtgSum/Meeting2010_10_Presentation.pdf is a fun set of technically-focused slides about future reading/writing methodologies and is very much worth a read. Also TDMR is literally just MIMO for hard drives.

    1. 4

      Is it literally rust in some sense, or is that a joke? The platters I’ve seen don’t look like rust, although they don’t look like anything else from the everyday world, either.

      1. 10

        The magnetic layer (the part of the sandwich of platter materials/coatings that actually stores the data) used to indeed be iron oxide up to the 1980s but anything after that doesn’t have iron oxides – just a fancy cobalt alloy.

        Nowadays it’s just a facetious term (unless you own vintage hard drives, in which case you actually are spinning literal rust).

    2. 3

      When it comes to the consumer side, spinning disk extinction is a matter of when and not if and that when is most likely in less than 10 years.

      1. 1

        There are a couple categories of app where spinning disks still make sense.

        One is, your app mostly does big reads/writes. And in particular, if your app does huge streaming read/writes, you can get GBs/sec of streaming throughput cheaply by RAIDing lots of spindles together.

        • Clickstream-y applications. Databases based on LSM trees (Cassandra, LevelDB, etc.) make writes sequential, and clickstream data tends to be all read through at once to gather stats (instead of picking out random rows at a time).
        • Data warehousing. Similar idea, just with data you ETL’d in–you’re going to scan over whole columns of data, but if you do it at GBs/second, it still works well. There are some interesting specialized apps, like genomics, that tend to work similarly (scan through a bunch of data, matching it against an index in RAM).
        • Serving up large blobs: big photos, videos, whatever.
        • Logs, backups, etc..
        • Services that keep all their randomly-accessed data in RAM. Counterintuitive, but some in-RAM database engines would use spinning disks, and it worked out since only snapshots and logs live on disk. At least as of some old paper, Google would keep their index in RAM, with servers loading it all from spinning disk on startup.

        The other big use case is is, your app stores so mind-bogglingly much it’s economically sensible to optimize for HDDs. I imagine that, say, Gmail might be able to hit their latency targets with less programmer work if all our mail content lived on Flash. But they’re, you know, GMail, so figuring out how to get by with spinning disks remains worth it.

        Of course, on the consumer side, they’re still a ridiculous deal for physical backups, and if you deal in media–knew a video producer that had a bag of portable USB HDDs.

        1. 1

          I managed a small cassandra cluster (8 node, 200k writes/s) for handling clickstreamy things. Theoretically LSM tree databases work really well on spinny disk in practice those same workloads run really really well on SSD. It was more cost effective (in terms of needed cluster size for desired performance characteristics) for us to run on SSD.

          1. 1

            Right–not as large as the crazy O(1000x) difference in IOPS, but SSDs give more perf per drive even on mostly sequential workloads. Perf can still be worth the $/GB for some.

        2. 1

          AFAIK there is still a cost/density argument for spinning disks. That will go away as transistor sizes get smaller, but afaik we’re approaching the limit for how physically small transistors can be, so it may never truly go away.

          1. 5

            There’s hardcore lithography involved in making the read/write heads of hard drives (http://life.lithoguru.com/?p=249) but making the platters don’t involve doing actual lithography (which helps hard drives stay cost-effective). Modern hard drives have bit cells a little larger than those of flash (it took until 2015 for flash bit cells to get smaller than those on spinning rust! http://www.digitalpreservation.gov/meetings/documents/storage14/Fontana_Volumetric%20Density%20Trends%20for%20Storage%20Components%20--%20LOC%2009222014.pdf).

            With flash, AFAIK, shrinking things is hard because the number of electrons you can store decreases with size which, combined with the fact that electrons leak out over time, makes the electron-counting business even hairier.