The name lead me to believe this is something that builds on top of bup, but it looks like it is not the case. Is the name similarity just a coincidence?
Anyway I’ll definitely check it out and will see if it would serve me better than borg
The docs mention in several places that backups are compressed by default and that it can be turned off per-backup, I’m failing to find which compression algorithm is used
Does it back up extended attributes too?
One thing that I usually miss in backup solutions is the ability to copy a backup between repositories. For example I have a machine at home and I want a subset of its data backed up to a machine at work, however I don’t have direct connectivity between the two machines. What I’d like to do is perform a backup at home to an external drive, then carry the drive to work and “replay” the backup onto the target machine. Usually this is not supported directly and I have to work around it by doing a restore followed by immediate backup after I carry the drive to work. Would it be possible to do this more elegantly with bupstash?
I’m failing to find which compression algorithm is used
The data chunks are compressed with zstd at the moment.
Does it back up extended attributes too?
There’s a ticket open for this, I didn’t get around to implementing them yet, but definitely will.
As a work around, If you just output tar directly into bupstash like this:
bupstash put --exec name=mydata.tar :: tar -C /dir -cvpf - .
You lose some efficiency, deduplication will not be as good, and bupstash list-contents will not work. but this would support whatever your system tar supports and will respect error codes coming from the system tar.
Ability to copy a backup between repositories
Not yet, I have an open ticket for syncing items efficiently between repositories. My personal use is to have one backup on a local drive, and one on a remote server so I definitely want this feature too.
Out of interest, why de-dupe on plaintext rather than ciphertext?
Have you seen perkeep - https://perkeep.org/? While that isn’t a backup solution, I wonder if there is some overlap in approaches which is of interest.
Can the bupstash data store be abstracted to run against various bulk storage platforms (like perkeep) - gdrive, AWS S3, local disk, etc?
Do you have a comparison against tarsnap? (That’s a paid hosted service with an open source client, but perhaps bupstash.io is headed in a similar direction?)
Out of interest, why de-dupe on plaintext rather than ciphertext?
Bupstash uses random nonces, the same data is not always encrypted the same way, I’m not quite sure it would work, depending on what you have in mind. Because dedup happens on hmac addresses that come from a hidden key, that is some form of obscuring the source data though.
Have you seen perkeep - https://perkeep.org/? While that isn’t a backup solution, I wonder if there is some overlap in approaches which is of interest.
Sure have, similar concepts, but quite different execution and low level details.
Can the bupstash data store be abstracted to run against various bulk storage platforms (like perkeep) - gdrive, AWS S3, local disk, etc?
Bupstash has a plugin interface for external storage, I have one implementation that is not ready for public use yet. The plugin interface isn’t that stable yet. Currently you need to run a server which serves a unix socket for bupstash to connect to, though this may change.
Do you have a comparison against tarsnap? (That’s a paid hosted service with an open source client, but perhaps bupstash.io is headed in a similar direction?)
I plan to do some future posts with some benchmarks that i think bupstash will do well in. tarsnap is not really open source as you cannot backup to your own disks like bupstash can and they hide the server code. With regards to hosting on bupstash.io, cogs are turning, though my plans with Bupstash are to always be free, open source and stable for as long as possible.
bupstash is cli focused and far more minimalist. The focus is mainly just storing data in an encrypted and deduplicated way and getting it out again. I think its best for managing backups of machines and data.
perkeep is about a user interface and understands things like twitter feeds, image data and cloud storage. I don’t think perkeep has any particular optimization for just dumping directories into it like bupstash has. I think perkeep is best for maintaining and curating a personal database of things like photos. Bupstash could be used for this, but the lack of ui or advanced browsing capabilities might be an issue.
To expand on the need for this: it’d be useful for people who wish to back up from (e.g.) a vps to their home server, without exposing the home server to the internet.
That’s my use-case anyway. It’s cheaper for me to put a big disk in my home server than it is to hire a hosted box.
I know this is a tangent, but it sounds like it might be useful for you.
I have found wireguard very useful to let a VPS pull from or push to a server on my home LAN. I have the box on my LAN connect to the wireguard server running on the VPS, then the VPS can initiate connections to my home server without exposing anything to the internet at large.
The name lead me to believe this is something that builds on top of bup, but it looks like it is not the case. Is the name similarity just a coincidence?
Anyway I’ll definitely check it out and will see if it would serve me better than borg
The name was just meant to be short for backup stash, so just a coincidence, I implemented everything from scratch.
If you give it a try, don’t hesitate to ask about any problems you face.
The docs mention in several places that backups are compressed by default and that it can be turned off per-backup, I’m failing to find which compression algorithm is used
Does it back up extended attributes too?
One thing that I usually miss in backup solutions is the ability to copy a backup between repositories. For example I have a machine at home and I want a subset of its data backed up to a machine at work, however I don’t have direct connectivity between the two machines. What I’d like to do is perform a backup at home to an external drive, then carry the drive to work and “replay” the backup onto the target machine. Usually this is not supported directly and I have to work around it by doing a restore followed by immediate backup after I carry the drive to work. Would it be possible to do this more elegantly with bupstash?
The data chunks are compressed with zstd at the moment.
There’s a ticket open for this, I didn’t get around to implementing them yet, but definitely will.
As a work around, If you just output tar directly into bupstash like this:
You lose some efficiency, deduplication will not be as good, and bupstash list-contents will not work. but this would support whatever your system tar supports and will respect error codes coming from the system tar.
Not yet, I have an open ticket for syncing items efficiently between repositories. My personal use is to have one backup on a local drive, and one on a remote server so I definitely want this feature too.
Great, thank you. I’ll watch this project with great interest
I misread it as “bupkis”.
Interesting, thank you! Some questions:
Out of interest, why de-dupe on plaintext rather than ciphertext?
Have you seen perkeep - https://perkeep.org/? While that isn’t a backup solution, I wonder if there is some overlap in approaches which is of interest.
Can the bupstash data store be abstracted to run against various bulk storage platforms (like perkeep) - gdrive, AWS S3, local disk, etc?
Do you have a comparison against tarsnap? (That’s a paid hosted service with an open source client, but perhaps bupstash.io is headed in a similar direction?)
Bupstash uses random nonces, the same data is not always encrypted the same way, I’m not quite sure it would work, depending on what you have in mind. Because dedup happens on hmac addresses that come from a hidden key, that is some form of obscuring the source data though.
Sure have, similar concepts, but quite different execution and low level details.
Bupstash has a plugin interface for external storage, I have one implementation that is not ready for public use yet. The plugin interface isn’t that stable yet. Currently you need to run a server which serves a unix socket for bupstash to connect to, though this may change.
I plan to do some future posts with some benchmarks that i think bupstash will do well in. tarsnap is not really open source as you cannot backup to your own disks like bupstash can and they hide the server code. With regards to hosting on bupstash.io, cogs are turning, though my plans with Bupstash are to always be free, open source and stable for as long as possible.
I’d like to hear your take on the pros/cons of each?
bupstash is cli focused and far more minimalist. The focus is mainly just storing data in an encrypted and deduplicated way and getting it out again. I think its best for managing backups of machines and data.
perkeep is about a user interface and understands things like twitter feeds, image data and cloud storage. I don’t think perkeep has any particular optimization for just dumping directories into it like bupstash has. I think perkeep is best for maintaining and curating a personal database of things like photos. Bupstash could be used for this, but the lack of ui or advanced browsing capabilities might be an issue.
Author here, someone asked me for a comparison with existing tools like borg or restic, I have made some notes here:
https://github.com/andrewchambers/bupstash/issues/26
I hope to do some more scientific and comprehensive benchmarks in the future.
Can bupstash pull files to backup from a remote machine?
This is one of the features I’ve wanted from restic for some time.
Thanks
Not currently, but It’s an interesting suggestion and I will have to give it some thought before I can say for sure what form it might take.
It does seem like something I would like to add though, I will make a ticket for this.
Thanks.
To expand on the need for this: it’d be useful for people who wish to back up from (e.g.) a vps to their home server, without exposing the home server to the internet.
That’s my use-case anyway. It’s cheaper for me to put a big disk in my home server than it is to hire a hosted box.
I know this is a tangent, but it sounds like it might be useful for you.
I have found wireguard very useful to let a VPS pull from or push to a server on my home LAN. I have the box on my LAN connect to the wireguard server running on the VPS, then the VPS can initiate connections to my home server without exposing anything to the internet at large.
I wrote down some detailed notes in terms of using a VPS on a public cloud as a reverse proxy but the wireguard part of the setup would be identical.
That’s actually a great idea. Thanks!