1. 33
    zfs.rent programming zfs.rent
  1.  

  2. 6

    Sounds interesting. I’m currently using zfsbackup-go, which does zfs send, splits the output into 100MB chunks, GPG encrypts and signs them, and then uploads them to an Azure storage blob. Incremental backups are great but there’s no way with this system to combine them on the server: when I want to do a restore, I need to download all of my incremental backups and replay them and in the interim I need to pay for all of the duplicated storage (though with the current cost of cloud storage, not very much!).

    I see a few reasons why this service isn’t quite what I want:

    • Each VM gets a single SATA spinning-rust disk. No redundancy unless you buy multiple disks. I’d like my off-site backups to be more reliable than my on-site ones and this is significantly less reliable than my local RAID-Z array, let alone cloud storage (which, even without going to the geographically replicated tiers, is typically spread across multiple media with a load of error correction). I’ve had disks die in co-located servers before.
    • The encryption that they provide is largely security theatre. The disk is encrypted, so I’m protected against a physical attacker walking off with the disk but it’s stored in memory that is directly readable by the hypervisor and anyone with root on the host system. If their CPUs supported AMD SEV-SNP or Intel TDX and they had a good secure-boot / attestation story, I’d be more inclined to trust this.
    • ZFS encryption has a bunch of vulnerabilities if you assume that an attacker has the ability to tamper with the physical media. It doesn’t provide cryptographic integrity on any of the metadata, including the store of the IV used for block encryption and so is vulnerable to substitution and replay attacks. This doesn’t matter for the common threat model (someone stealing your disk shouldn’t get your data) but it does matter in the untrusted cloud use case (the cloud provider shouldn’t be able to tamper with my data). Dm-verity addresses this, but is currently read-only. Dm-integrity provides block-level integrity and so is vulnerable to replay. The world needs read-write dm-verity.

    The pricing seems pretty attractive - Azure / AWS storage is around $15/TB/month, so $10/month for up to 8TB is pretty good, though if you actually care about your data then you probably need 2-3 disks in a pool (and even then they’re in the same node and so are liable to see correlated failures in the disks), which makes it around $30/month. That makes it about the same price as cloud storage for people with about 2TB of backups (I have a bit less than that), more expensive for people with less and less expensive for people with more. If you’ve got a 100TB NAS that you want off-site backups for and are happy with their security model then it might be interesting.

    At that scale; however, I’d be tempted to just expose the disks via iSCSI over a VM and run ZFS locally (ideally on top of something providing block-level encryption and integrity protection).

    1. 4

      Neat idea! A tangent, but I worked on a commercial project once that was basically ZFS send, but built on top of any other file system (doing the deduplication/compression/encryption at the application layer). That would be a really cool side project for someone.

      1. 5

        I think rsync.net offers this ability too, right? either way, great to see this

        1. 1

          Sounds a bit like BorgBackup or Restic?

          1. 1

            oh nice, they support zfs receive?

            1. 2

              Oop, sorry to never reply to this - I think one or both of us misunderstood parent’s comment :P

              I can really only speak for Borg, although my understanding is that Restic has a similar design. Borg acts at the application layer, so it’s filesystem-agnostic; the elevator pitch is, “backups take the space and bandwidth of incremental backups, but each one acts like a full backup”. That is, you can restore, delete, etc. each backup independently.

              Borg does this by having a consistent read/write “repository” at the other end, instead of just giving you some files you can stick anywhere. (In practice I’m sure you could make this work with pretty much anything; you can definitely hack a POSIX filesystem for Borg on top of e.g. object storage, though it might not be pretty.) I’m not sure if this is actually how it works internally, but the way I like to think about a Borg repository is as a collection of reference-counted (compressed and encrypted) file chunks. Whenever you run a new backup, it will reuse chunks when possible, incrementing their reference count. So effectively each backup contains the entire dataset and is a “full backup” for the purposes of restoration, deletion etc., it just happens to be that it shares most of its data with previous backups, which is that gives it the “space and bandwidth of an incremental backup” property.

              So to answer your original question, no, Borg has no knowledge of ZFS send/receive, or really any other filesystem. But, it has some similar magic-feeling abilities - in particular, paying only the cost of incremental changes, but still having the final result (a ZFS snapshot or a Borg backup) act like a full, independent filesystem tree. Assuming you are sending your ZFS snapshots to zfs receive on another zpool and not files in a different filesystem, you never have the problem of, “oh no, I want to restore from backup but my last full backup was 6 incrementals behind where I need to restore from, so now I need to download the last full one and then apply the 6 incrementals on top of them.” You can just access (or zfs send) the relevant snapshot directly. Same with Borg.

        2. 4

          I would love to see this with FreeBSD, because you can boot it off the same ZFS.

          1. 3

            Ive been wanting something like this for a long time, sign me up!

            1. 3

              That’s pretty much the “slot” hosting business, but I prefer to by storage usage. It’s not very cost to keep a 8T drive with 4G ram dedicated to backup smaller datasets. zfs send to s3 backend works better for me.

              Just found https://github.com/presslabs/z3, combined with the 75GB free s3 from scaleway, it’s pretty much enough for backup root and my home.

              1. 2

                The rent-to-own model is interesting.

                1. 2

                  I assume the DC is based in the U.S. based on their rent-to-own shipping comment, but they do not explicitly state the location of the DC.

                  Are there plans to expand in future into other regions? I’d be good to hear some more details and plans, once out of BETA

                  1. 2

                    make the long drive to our Sacramento co-location

                    It’s mentioned in the setup time section. I assume this means Sacramento, CA, which is indeed in the US.

                  2. 1

                    Maybe it’s time to google “zfs” for once in my life. The encryption sounds cool as. (Is encrypted drives on a remote server like this safe? Can you trust that the drives are encrypted securely, or can the server host decrypt them in some evil way?)

                    1. 5

                      Zfs is really cool. There have been a few filesystem + volume manager “in one” systems - and there are some arguments to be made about cross cutting concerns - like handling SSD discard with an encrypted file system.

                      As for software encryption - à remote server isn’t safe if you don’t trust the provider. They even let you know they’ll run your is that mount the disks in a vm - so they cold just dum the ram and read the encryption keys.

                      If your threat model is more: I’d rather not anyone get the data from the disks after they’re spun down - it’s OK. (remote trusted computing is a really high bar to clear, with hardware enclaves that you can communicate with over the network, encrypted ram, worrying about encrypted cpu cache…).

                      1. 6

                        You don’t have to send the encryption key to the receiving server with ZFS.

                        1. 2

                          Ah, I was mostly responding to the question about mounting encrypted data on a remote server.

                          For just backup, no “remote” reading, it is indeed possible to send snapshots - and no need to decrypt/mount (but you can, if you want to).