1. 25
    1. 26

      You should really include adding push URLs to a repo’s default remote: https://stackoverflow.com/a/14290145/317670

      The best backups are the ones that happen automatically in the future.

      Add one for GitHub, one for sr.ht, and one for a local NAS or ssh server. Now, whenever you push, you have two remote services and a local service storing it. The local NAS can be a single raspberry pi with the default 8GB SD card for most people.

      1. 3

        This is a good solution, but on the one repo I used it with I switched to a different setup. I now use a local gitea to pull from sr.ht and push that same repo to GitHub, with my origin only pushing to sr.ht.

        I’ve been meaning to give this a try, just haven’t found the time yet: https://github.com/cooperspencer/gickup

      2. 3

        That is actually very cool. I had no idea! I will update the article ASAP to include this :)

      3. 1

        I was not aware of this, thanks!

        I’ve been using multiple remotes for my repos and I setup a git alias in my .gitconfig for pusing to all remotes at once;

        pa = !git remote | xargs -L1 git push # Push to all remotes
        

        Maybe it’s time to start using pushurl.

        1. 2

          Why do you need xargs here at all? Just add more remote urls to the same remote name and then git push remote –all. Programmers seem to love to overcomplicate this stuff with layers that aren’t needed.

          git remote set-url –add remote some/url

          Also setup backups, this remote nonsense won’t do jack for not yet committed stashes or changes.

          1. 1

            That’s what’s in the link. I should have done a tldr in my comment…

          2. 1

            Hehe, that’s what I meant by

            Maybe it’s time to start using pushurl.

            I’m in the process of starting to use this style instead :)

    2. 8

      I back up my git repositories the same way I back up everything else on my computer: with a robust and complete backup system. People looking for some git-specific solution to this mustn’t have a proper robust backup system in place. To which I say: fix that! Doing something weird with git repos is time wasted when you should be fixing the root issue.

      1. 3

        I don’t think the only source of interest in this is bad backup hygiene.

        The “just use a complete system backup” also seems to require either having a canonical system with a checkout of every project you’ve ever done or the ability to carefully retain specific backups that happen to contain the most-recent copy of some git repo?

        1. 1

          I think you are mixing things up here. If you don’t store your archive projects on your main computer, fine; but you must store them somewhere else. And that somewhere else needs backing up too.

          1. 1

            What if that “somewhere else” is GitHub or sr.ht?

            1. 1

              Then I would definitely recommend backing those up, too. Doing so is more complex than this article covers, though: for GitHub you’ve got issues, PRs, milestones, release metadata etc to capture too.

          2. 1

            I’m not sure how this addresses what I wrote.

            Since I noted that I don’t think the only source is bad backup hygiene, assume the systems involved are backed up.

            Relying on these system backups as git backups seems to require having the ability to carefully retain specific system backups that happen (i.e., it is a coincidence, not the dedicated purpose of the backup) to contain the most-recent copy of some git repo, no?

            They are certainly better than nothing, but the risks and awkwardness of depending on them should make it evident that people looking for some git-specific solution may already have system backups but find them suboptimal in this domain.

      2. 3

        Same, I confess I’m confused about the problem here.

        The authoritative copy of my code is on my laptop, and potentially on the computers of any collaborators. I push up a copy to Github/Gitlab/etc. for collaboration, but that’s not a backup, it’s a workspace.

        My code is in my home dir. I back up my home dir in several ways. What’s the problem?

      3. 1

        That is not always the solution needed. Personally I don’t care about half of the things I have in the laptop because they are replaceable but some others… I’d rather have a few copies.

        1. 1

          Whether you do a complete system backup or cherry pick paths/files, is orthogonal really. You can do either with a proper generic backup solution. I advise the former but the latter is still fine for git repositories and anything else.

      4. 1

        This seems useful as a mirror rather than a backup.

    3. 7

      I’ve been hacking on Gickup lately as I try to backup all of the Git repos I touch onto some new home lab infrastructure and get some hands-on experience with Go in a simple project.

      It effectively does this:

      hosts = [GitLab, GitHub, Gitea, GOGS, Bitbucket, etc.]
      destinations = [Local, GitLab, Gitea, GOGS, etc.]
      for host in hosts:
        for dest in destinations:
          host.listProjects.map(project => gitCloneOrPull(project.gitUrl, dest))
      

      It’s got a simple YAML config. I just added Prometheus stats to its scheduled run mode.

      While it takes some software on top of git, it feels like a better solution for automated backups. This article is giving me some enhancement ideas for more permanent storage, though…

    4. 4

      I feel compelled to mention “hydra hosting”, where you have multiple remotes of equal status: https://seirdy.one/2020/11/18/git-workflow-1.html

      I haven’t tried this yet, in part because if e.g. your issues are not stored in your git repo, in practice the primary repo would be wherever you file your bugs. I’ve been considering storing my issues in git using https://github.com/MichaelMure/git-bug which would make all copies of the git repo more interchangeable. If I did that, hydra hosting starts to sound more attractive. One thing holding me back is that it doesn’t appear to have a SourceHut bridge yet.

      1. 2

        I feel like I’ve been looking for this term ‘hydra hosting’. My someday project has been how to approach this for files (not just git repos) using git-annex because it supports multiple ‘remotes’ that can have varying forms of state compared to a traditional serial mirroring approach. Only way I see it being practical though is through more ui-level interfaces for git-annex like https://github.com/andrewringler/git-annex-turtle (macos) supported on other operating systems.

    5. 3

      One can also add a remote that lives on the local filesystem, so the listed downsides of the first option aren’t really there. A backup on the local filesystem is of limited use, of course. But it is possible.

      1. 1

        “local filesystem” can include your offline storage hard disk (/flash) that you only plug in when you actually want to do a backup to it.

    6. 3

      If you’re using gitlabs, and/or your main concern is the code repository, you can automate backup creation with cron. The command is:

      sudo gitlab-rake gitlab:backup:create ... This has been in gitlabs since at least 11.x.y. The downside to this backup is that it’s version locked meaning that a backup of gitlabs from a machine running gitlabs 14.6.0 will only restore to a fresh machine running gitlabs 14.6.0. I would find it nice if I could test upgrades by backing up from 14.6.0 and restoring to 14.7.10 obviously within reason.

      As the files are stored in a single file, an strategy for backup can look like this:

      • Deploy a gitlabs machine in AWS.
      • Create an AMI of the gitlabs machine.
      • Arrange push for your main working repository to the AWS gitlabs machine on a project-by-project basis.
      • Backup the community edition machine daily, hourly, whatever you need or can afford.
      • Push the single backup file into AWS S3.

      Note well:

      • You aren’t creating another general purpose gitlabs machine here. The AWS machine will have a subset of users from your working git instance. This is done to reduce the maintenance required by this solution.
      • By using a repository available 24/7, you can backup from any instance of git that can automate a push to an ssh or an https hosted repository.

      Your absolute dead recovery process looks like this:

      • Create a new gitlabs machine from the template.
      • Fish the appropriate backup out of S3.
      • Restore it.
      • Reestablish the state of your main working repository with the recovered state from backup.

      All of this is heavily dependent on the cloud but in an absolute disaster, you only need to be able to fish the appropriate backup out of AWS S3 and build a gitlabs community edition machine on the appropriate gitlabs-ce version.

      This strategy was designed to store a backup of your entire gitlabs machine in a different region from your working gitlabs machine in the most bandwidth friendly way possible.

    7. 2

      Very nice overview. We reviewed such options but in the end we actually ended up going with the reverse: Self-hosted gitea as primary host, which then auto-pushes to GitHub anything which should be shared publicly.

      The point of picking that approach is we wanted backups (through full server backup with reporting and cold storage rotation) to be the default - something to opt out of - and broader share to be the optional of the two. If a push to share goes down it gets noticed when needed and the fix is simple & low-risk. If a push for backup goes down you risk only noticing when you really need the backup to be there. Yes that is also a failure to validate your backups, but it is still a very high risk to have as default behaviour.

      Everyone’s needs are obviously different and very likely not everyone can fit an approach like that to their access needs, but for us it certainly helps support a good nights sleep.

    8. 1

      Seems like a not-terrible place to ask:

      For a while (see: SO question) I’ve been curious about “dehydrating” a fully-pushed repo into either a script or some metadata for restoring it (as in, with the same remotes, maybe branches). Has anyone seen existing tooling for this?

      (Not urgent, so it needn’t be done/polished. I took a swing at some Shell scripting for it, though I put it on hold when I realized the logic for when to actually use it or not wasn’t quite as automatable as I’d hoped. I’ll need to do some more thinking/designing there before it’s actionable.)

      1. 2

        Just use this perl script: https://myrepos.branchable.com/

        Works on not “only” git too. Then what you want is “just” a config file.

    9. 1

      It feels a bit silly to ding “Pushing to an additional remote” for not being offline. Remotes can be local. It’s trivial to create a second local copy and push/pull into it.

    10. 1

      What I do is have a Gitea server and use this script to mirror the repos there: https://github.com/jaedle/mirror-to-gitea . This specific script is only useful between Github->Gitea, but would not be hard to do similar scripts for other APIs.

      In this way, you have an alternative place to check the repos from the web browser, but also, as are mirrors, they keep up-to-date to the original repository.