The irony here is that they, an enterprise, have enough redundancy and infrastructure to handle the failure rate of consumer-grade drives while I, a consumer, just want the damn thing to work so should probably get an enterprise-grade drive.
It sounded like enterprise drives offered no measurable improvement in reliability.
Close: Enterprise drives offered no measurable improvement in reliability for their sample and their workload.
They didn’t do great statistics, just said something like “we saw 4.6% failure on the enterprise drives and 4.2% failure on the consumer drive”, but no expression of confidence. Furthermore, their systems with enterprise drives were running their core services for the most part, and their consumer drives were not. They had one pod with enterprise sata drives and claimed a ‘statistically consistent’ result, but this is tiny sample - 45 ES drives in a pod vs ~370 ES drives in total vs ~14700 consumer sata drives in total.
Furthermore, 4.2% is still atrocious, given their pod workload. Or at least, their presumed workload, because they haven’t gone into it much, at least as far as I’ve seen. What I do know is that their model is for customers to do bulk uploads (slow sequential writes) on their systems, which then sits round for ages, and maybe gets read occasionally. It’s fairly unlikely that data gets overwritten at any great rate. Nothing wrong with this model at all, but it’s nothing like an active mail, database, or file server, for example.
As caboteria said, this doesn’t really matter for Backblaze - they can deal quite happily with a 4.2% failure rate because their operational and data models work fine for it. So if you’re looking at doing something that has similar data/drive use characteristics to what Backblaze do and can swing the same operational model, you’re fine.
But that’s about as far as their analysis goes, and I wish they’d be more honest about that.