There are a few things here that might be easier with ZFS. Here are a few constructive recommendations based on personal experience. (Note that I haven’t used btrfs in a while, so my advice is going to be fairly onesided toward ZFS.)
Mirroring
You do actually get bitrot repair with ZFS if you use either RAIDz or a mirror. Hardware RAID won’t defend against silent data corruption, which is almost negligent to ignore on spinning rust drives for long-term backups. ZFS lets you choose whether to use a mirror or RAIDz and will still repair corrupted data if a good copy exists somewhere. And, even if you use ZFS on a single drive (don’t), it will at least refuse to read the corrupted data.
Synchronizing
For ZFS, I use Sanoid for snapshots, and Syncoid (part of Sanoid) for synchronization. This uses zfs send, is a ton more efficient than rsync, and is easily configured in NixOS. rsync.net and datto.com are two services that let you use them as zfs send targets over ssh. Personally, I have a remote dedicated backup server that I send to over a VPN to make sure the “1” part of the 3-2-1 rule works.
Encryption
I think this is just specific to the filesystems you’re used to…
There’s an issue when you choose to encrypt backups: if data becomes corrupted, it’s hard to recover anything without a copy.
Not the case with ZFS on Linux 0.8. You get all the anti-corruption guarantees ZFS already gives you if you create an encrypted dataset. Also, you can zfs send -w or syncoid --sendoptions=w and get end-to-end encrypted backups where the backup target doesn’t have to know the key to your dataset to receive it. This is what the future is like.
File organization
I’d organize my top-level folders (music/documents/pictures/etc) with datasets in ZFS, which are all exposed as mountpoints on Linux or drives on Windows. Then, I’d add folders under them like normal volumes. You can also have different replication schemes/snapshot schemes/record sizes/compression methods/encryption methods for each to tune them to different workloads.
Take joy in knowing your information is safe probably until the day you die!
I’d argue that’s a [citation needed] unless you’re using a filesystem with built-in checksumming. NTFS is not one of those filesystems. Give ZFS a try, and truly have your information be safe until the day you die :-)
While I agree (in theory, I haven’t used ZFS), what you’re describing does not seem to include the “… for mortals” part.
Most people who own a computer and care enough to have backups should be able to set up 2 hard drives and use BackBlaze. The article even recommends to get a hard drive enclosure rather than fiddling with internal hard drives, to remove as much friction as possible.
Fair! Though, I’ve seen the Raspberry Pi 4 perform quite well with a 64-bit Linux (specifically, NixOS 20.09pre), ZFS, and a dual-drive external USB 3.0 bay. And ZFS on Windows is also a thing now, although caveat emptor with it until it’s more stable. So, maybe not for mortals quite yet, but getting there for people who are willing to get familiar with the Raspberry Pi and the ZFS command line. Which you could argue is still not “mortals” :-)
I just question the lack of data checksums for long-term storage. FS checksumming is actually pretty important for this, and if you just store two copies of your data without checksums, you have no idea which is the “right” one if a cosmic ray happens to bitflip one of your drives. So maybe a better suggestion if you’re stuck on something like NTFS would be to store an (also mirrored) SHA256SUMS file with your data.
My personal feeling is that most people do not care if a single file / photo becomes corrupt or has a glitch in it. For documents, images, audio and more, the program you use to consume it will happily fill in a blank or maybe show a weird symbol or play an odd noise.
In many ways, backups are like password management: There’s the good/right way to do it, and then there’s the “good enough” way for most people, which balances “will actually be used” with “good enough protection”.
Nobody who just barely cares enough about backups are going to set up a Raspberry Pi or do anything beyond what this article recommends. There’s a reason a lot of people use a NAS for backups, you plug in a bunch of disks (often included in the purchase) and then you set up using a nice web UI and it’s up and running.
I haven’t used ZFS with it but the rpi 4 is indeed a seriously impressive server given its size and cost!
I haven’t gone the external USB route, and instead use NFS mounted filesystems from my NAS over gigabit ethernet. I get great perfomance with that. I should try an external USB drive and compare with some benchmarking.
I like the NFS from NAS option because it means that everything is instantly backed up so the pi is truly disposable.
If you try ZFS with the Raspberry Pi 4, try to get a 64-bit aarch64 OS that has a good ZFS package (like NixOS). 64-bit architecture is a good idea for ZFS anyway, since block pointers, checksums, etc are all 64 bits wide. In my experience this setup has worked well enough that I’ve recommended a Raspberry Pi 4 and a USB 3 hard drive dock to others as a low-cost way to try ZFS with real drives. (Definitely don’t use the Raspberry Pi 3 for this, it has USB 2 shared with the ethernet controller, which will run you into all sorts of latency problems if you try to use it as a NAS).
Raspbian, last I checked, only boots the ARM cores in 32-bit mode, and who knows what sort of ZFS modules they even provide. Compiling ZFS from source on a raspberry pi doesn’t sound that enjoyable, and NixOS’ binary cache seems to always have it.
I’m sure nixos is amazing because everything I hear about it is amazing but I also wonder how Ubuntu 20.04 stacks up. I’m more familiar with its administrative interfaces and the like. I know the ISO for rpi is 64bit.
Isn’t this getting easier with tools like FreeNAS or the bundled ZFS support in recent Ubuntu versions like 20.04?
You make a good point though. When I set up my home NAS ~3 years ago I chose Synology despite it being closed source because it had the appliance characteristics I wanted and was extensible enough software wise that I could be sure it would continue to meet my needs so long as it didn’t run out of disk :)
(Ducks awaiting the rotten tomatoes for having chosen a closed source solution :)
Encryption in ZFS is actually applied at the record (nominally 128k data block) level. GCM is recommended in practice, but, even if you use CBC mode, you’re likely to only corrupt a single record that’s mirrored elsewhere. It’s also overwhelmingly probable that the data checksums will catch it before it tries to decrypt, in all cases.
This scheme does not get me what I want most, especially for my relatives: the ability to recover the version of a file they accidentally deleted, overwrote or modified half a year ago. Backblaze only gets you 30 days. The same with Onedrive, Dropbox, etc.
I agree. I am a big fan of tarsnap. It’s the best turnkey backup solution I know. It’s also really nice that they are a couple of unix-y tools so there is dozens of “front ends” for ti and it’s super easy to make one for whatever is your use case.
There is a tool with a lot of similarities if you want to do big private backups that don’t need the strong guarantees that tarsnap gives you, which is Borg, having a fairly similar feature set and requires only something that can do SFTP, which oftentimes is provided by hosting companies for very cheap. However, that’s usually a lot less redundancy there.
The thing that I tend to have a hard time with is the whole key storage topic. Having save and secure backup places for those can be hard topic.
Timely article - currently looking at a backup solution for my NAS.
I’m a bit bummed about Backblaze, though. My reading is that their $60 annual plan is for Windows/Mac only; I could cobble together something that targets their B2 service but “cobble together Frankenstein’s monster” is exactly what scares me.
Anyone know of a similar reliable flat-fee plan targeting Linux that lets me not care about backups? Or am I destined to be building shell scripts to send stuff to Glacier for the rest of my days…
[This is my personal opinion and does not reflect the opinions of my employers yada yada yada :)]
Backblaze is pretty great. In addition to supporting Linux as a first class citizen, they’re also very permissive around what’s considered a ‘computer’ so in my case I’m paying $60/year to back up the entirety of my Synolgoy NAS (currently running at around 3TB storage used).
Respectfully, your reading is incorrect. My Synology NAS is currently backing itself up to Backblaze, and as others in this thread have mentioned they provide LInux client support.
I use duply which is a front-end to duplicity that simplifies its most common operations[*]. The backend I use is Backblaze B2 and it’s pretty seamless and not Frankenstein monster-esque (at least by my standards).
The fiddling for the B2 part is limited to creating a bucket and some auth credentials and putting the right string in the duply profile file.
$60/year buys you 1TB with B2 (but you would pay more to download it all in event of data loss).
I haven’t done a full recovery yet (touch wood) but I’ve recovered individual files a few times without any hassle.
I also follow a similar scheme to the article, so my B2 backup is only there as a fallback if my local (NAS) backups fail for some reason.
[*] I did use duplicity, but found duply handles my use case fine without the custom scripting I used to do.
I have the remote bits of this down pat with my NAS, it backs up to Backblaze using Synology Hyperackup, so I get full disaster recovery capability as well as the ability to restore a single file given a particular timeframe.
I need to buy a USB external drive and do the local backup piece for quick recovery potental.
I need to look into what alternatives exist in the freeware world for systems like FreeNAS as I may well go with that approach next time around - as @numinit mentions, ZFS is a pretty compelling option for doing data storage in a rigorous way.
There are a few things here that might be easier with ZFS. Here are a few constructive recommendations based on personal experience. (Note that I haven’t used btrfs in a while, so my advice is going to be fairly onesided toward ZFS.)
You do actually get bitrot repair with ZFS if you use either RAIDz or a mirror. Hardware RAID won’t defend against silent data corruption, which is almost negligent to ignore on spinning rust drives for long-term backups. ZFS lets you choose whether to use a mirror or RAIDz and will still repair corrupted data if a good copy exists somewhere. And, even if you use ZFS on a single drive (don’t), it will at least refuse to read the corrupted data.
For ZFS, I use Sanoid for snapshots, and Syncoid (part of Sanoid) for synchronization. This uses
zfs send
, is a ton more efficient than rsync, and is easily configured in NixOS. rsync.net and datto.com are two services that let you use them as zfs send targets over ssh. Personally, I have a remote dedicated backup server that I send to over a VPN to make sure the “1” part of the 3-2-1 rule works.I think this is just specific to the filesystems you’re used to…
Not the case with ZFS on Linux 0.8. You get all the anti-corruption guarantees ZFS already gives you if you create an encrypted dataset. Also, you can
zfs send -w
orsyncoid --sendoptions=w
and get end-to-end encrypted backups where the backup target doesn’t have to know the key to your dataset to receive it. This is what the future is like.I’d organize my top-level folders (music/documents/pictures/etc) with datasets in ZFS, which are all exposed as mountpoints on Linux or drives on Windows. Then, I’d add folders under them like normal volumes. You can also have different replication schemes/snapshot schemes/record sizes/compression methods/encryption methods for each to tune them to different workloads.
I’d argue that’s a [citation needed] unless you’re using a filesystem with built-in checksumming. NTFS is not one of those filesystems. Give ZFS a try, and truly have your information be safe until the day you die :-)
While I agree (in theory, I haven’t used ZFS), what you’re describing does not seem to include the “… for mortals” part.
Most people who own a computer and care enough to have backups should be able to set up 2 hard drives and use BackBlaze. The article even recommends to get a hard drive enclosure rather than fiddling with internal hard drives, to remove as much friction as possible.
Fair! Though, I’ve seen the Raspberry Pi 4 perform quite well with a 64-bit Linux (specifically, NixOS 20.09pre), ZFS, and a dual-drive external USB 3.0 bay. And ZFS on Windows is also a thing now, although caveat emptor with it until it’s more stable. So, maybe not for mortals quite yet, but getting there for people who are willing to get familiar with the Raspberry Pi and the ZFS command line. Which you could argue is still not “mortals” :-)
I just question the lack of data checksums for long-term storage. FS checksumming is actually pretty important for this, and if you just store two copies of your data without checksums, you have no idea which is the “right” one if a cosmic ray happens to bitflip one of your drives. So maybe a better suggestion if you’re stuck on something like NTFS would be to store an (also mirrored) SHA256SUMS file with your data.
My personal feeling is that most people do not care if a single file / photo becomes corrupt or has a glitch in it. For documents, images, audio and more, the program you use to consume it will happily fill in a blank or maybe show a weird symbol or play an odd noise.
In many ways, backups are like password management: There’s the good/right way to do it, and then there’s the “good enough” way for most people, which balances “will actually be used” with “good enough protection”.
Nobody who just barely cares enough about backups are going to set up a Raspberry Pi or do anything beyond what this article recommends. There’s a reason a lot of people use a NAS for backups, you plug in a bunch of disks (often included in the purchase) and then you set up using a nice web UI and it’s up and running.
I haven’t used ZFS with it but the rpi 4 is indeed a seriously impressive server given its size and cost!
I haven’t gone the external USB route, and instead use NFS mounted filesystems from my NAS over gigabit ethernet. I get great perfomance with that. I should try an external USB drive and compare with some benchmarking.
I like the NFS from NAS option because it means that everything is instantly backed up so the pi is truly disposable.
If you try ZFS with the Raspberry Pi 4, try to get a 64-bit aarch64 OS that has a good ZFS package (like NixOS). 64-bit architecture is a good idea for ZFS anyway, since block pointers, checksums, etc are all 64 bits wide. In my experience this setup has worked well enough that I’ve recommended a Raspberry Pi 4 and a USB 3 hard drive dock to others as a low-cost way to try ZFS with real drives. (Definitely don’t use the Raspberry Pi 3 for this, it has USB 2 shared with the ethernet controller, which will run you into all sorts of latency problems if you try to use it as a NAS).
Raspbian, last I checked, only boots the ARM cores in 32-bit mode, and who knows what sort of ZFS modules they even provide. Compiling ZFS from source on a raspberry pi doesn’t sound that enjoyable, and NixOS’ binary cache seems to always have it.
I’m sure nixos is amazing because everything I hear about it is amazing but I also wonder how Ubuntu 20.04 stacks up. I’m more familiar with its administrative interfaces and the like. I know the ISO for rpi is 64bit.
Oh, good to know. Will be interested in how that goes.
Isn’t this getting easier with tools like FreeNAS or the bundled ZFS support in recent Ubuntu versions like 20.04?
You make a good point though. When I set up my home NAS ~3 years ago I chose Synology despite it being closed source because it had the appliance characteristics I wanted and was extensible enough software wise that I could be sure it would continue to meet my needs so long as it didn’t run out of disk :)
(Ducks awaiting the rotten tomatoes for having chosen a closed source solution :)
With CBC, if an early block in a file gets corrupted, it won’t be possible to recover the rest.
Not that this is a huge concern, if you have mirroring and checksumming set up properly.
Encryption in ZFS is actually applied at the record (nominally 128k data block) level. GCM is recommended in practice, but, even if you use CBC mode, you’re likely to only corrupt a single record that’s mirrored elsewhere. It’s also overwhelmingly probable that the data checksums will catch it before it tries to decrypt, in all cases.
This scheme does not get me what I want most, especially for my relatives: the ability to recover the version of a file they accidentally deleted, overwrote or modified half a year ago. Backblaze only gets you 30 days. The same with Onedrive, Dropbox, etc.
This is why I use HyperBackup on my Synology NAS.
I’d be super curious if an open source alternative exists.
I recently learned that one can run restic on Synology. Seems very interesting!
It’s a Linux box underneath. It’s just that all the Synology branded software is closed source.
I wasn’t aware of restic, I’ll give it a look. Hyperbackup is pretty good, though :)
I don’t have any meaningful backup strategy, but if I did, I’d probably use tarsnap
I agree. I am a big fan of tarsnap. It’s the best turnkey backup solution I know. It’s also really nice that they are a couple of unix-y tools so there is dozens of “front ends” for ti and it’s super easy to make one for whatever is your use case.
There is a tool with a lot of similarities if you want to do big private backups that don’t need the strong guarantees that tarsnap gives you, which is Borg, having a fairly similar feature set and requires only something that can do SFTP, which oftentimes is provided by hosting companies for very cheap. However, that’s usually a lot less redundancy there.
The thing that I tend to have a hard time with is the whole key storage topic. Having save and secure backup places for those can be hard topic.
Timely article - currently looking at a backup solution for my NAS.
I’m a bit bummed about Backblaze, though. My reading is that their $60 annual plan is for Windows/Mac only; I could cobble together something that targets their B2 service but “cobble together Frankenstein’s monster” is exactly what scares me.
Anyone know of a similar reliable flat-fee plan targeting Linux that lets me not care about backups? Or am I destined to be building shell scripts to send stuff to Glacier for the rest of my days…
According to https://help.backblaze.com/hc/en-us/articles/217664628-How-does-Backblaze-support-Linux-Users-?mobile_site=true you have a few options :)
[This is my personal opinion and does not reflect the opinions of my employers yada yada yada :)]
Backblaze is pretty great. In addition to supporting Linux as a first class citizen, they’re also very permissive around what’s considered a ‘computer’ so in my case I’m paying $60/year to back up the entirety of my Synolgoy NAS (currently running at around 3TB storage used).
Really appreciate those guys!
I haven’t tried, but it looks like rclone supports backblaze.
Respectfully, your reading is incorrect. My Synology NAS is currently backing itself up to Backblaze, and as others in this thread have mentioned they provide LInux client support.
For their $60 unlimited plan? Or B2?
If the latter I’d have to pass and my questions stand around flat-priced plans with Linux support.
If the former, I’m going to have to figure out how I’ve misread all their documentation so badly ;)
You’re right it’s B2. However I’m storing ~3TB for $10/mo which is still 50% less than I was paying with that other provider you mentioned :)
[Dancing carefully since this is my employer. Opinions are my own etc etc ;)]
I think I’m picking up what you’re putting down, thanks ;) I’ll go run some more numbers!
I use duply which is a front-end to duplicity that simplifies its most common operations[*]. The backend I use is Backblaze B2 and it’s pretty seamless and not Frankenstein monster-esque (at least by my standards).
The fiddling for the B2 part is limited to creating a bucket and some auth credentials and putting the right string in the duply profile file.
$60/year buys you 1TB with B2 (but you would pay more to download it all in event of data loss).
I haven’t done a full recovery yet (touch wood) but I’ve recovered individual files a few times without any hassle.
I also follow a similar scheme to the article, so my B2 backup is only there as a fallback if my local (NAS) backups fail for some reason.
[*] I did use duplicity, but found duply handles my use case fine without the custom scripting I used to do.
Love this article!
I have the remote bits of this down pat with my NAS, it backs up to Backblaze using Synology Hyperackup, so I get full disaster recovery capability as well as the ability to restore a single file given a particular timeframe.
I need to buy a USB external drive and do the local backup piece for quick recovery potental.
I need to look into what alternatives exist in the freeware world for systems like FreeNAS as I may well go with that approach next time around - as @numinit mentions, ZFS is a pretty compelling option for doing data storage in a rigorous way.