Not sure that these are the right tags to put on, but none really seemed to fit. BEWARE OF THE POSSIBLE IMPENDING DOOM seemed like the only tag that would really work, but we don’t have that one. Oh well.
Going from a single point of failure to a single untested backup is not a great technical solution. I think the answer is probably some kind of distributed hash table (which, yeah, is sort of the highest-level description of git, I know) that federates copies of the data, so that every use of the data produces more backups in the hands of the users who care.
so that every use of the data produces more backups in the hands of the users who care
Isn’t this – exactly what git already does? Every person interested enough to do a git clone <foo> is now a point in time backup of the entire history of <foo>.
I agree with the other commenters that GitHub is probably not going anywhere, and the basic information that GitHub hosts is trivially replicatable.
I think there is actually another issue though and that is the GitHub monoculture. Many people only know GitHub. I know people who barely understand that git and GitHub are not the same thing. On top of that, IMO, GitHub is not actually any good at collaborative programming (the PR, system, for example is very lacking). But some companies buy GHE because the people in charge of making those decisions don’t know any different and GHE is generally quite terrible for the requirements many companies have around code reviews. I actually don’t really enjoy contributing to most open source projects on GitHub because the actual act of collaboration I find so problematic.
BitBucket and Stash much more heavily promote ownership and responsibility over codebases. PR’s have specific rewiewers. ACLs are down to the branch level, which encourages people to own the code. I’ve had so many PRs be /dev/null’s just because nobody wanted to take ownership over reviewing it on GH or GHE. I’ve had people commit to master without even asking the owners if it’s ok.
So the form of collaboration I like is very ownership and responsibility based.
Isn’t this the same problem we have with putting any code or content on any site that we don’t own ourselves? I think the best thing about GitHub becoming popular is that now days most people have their code on something that isn’t SourceForge. At least if your GitHub project is deleted you still have a complete source code backup with history. You may lost meta data such as a wiki, bug tracking and a small webpage, but you still have the most important thing for any project, the code. I’m also not sure why he is picking on Github, because it is the most popular? A single point of failure? I’d be more worried about Facebook, Google+, Gmail, Twitter, Flickr going under. Any of those things could be disastrous for many many people. Go warn people about those first. When the world is creating regular back ups of those services then we can start worry about services where the primary data is a backup of your local project folder.
Thanks to the nature of DVCS and the increased utilization of package managers, I’m not particularly concerned about widely-used source code disappearing. More concerning is all the bit rotting links which point to GitHub or any other website for that matter.
Exactly. Link rot is a serious concern, but it’s not as though it’s a problem being ignored, either - it was the original mission statement of archive.org.
[Comment removed by author]
I thought you were going to say: “Here, have a tiny program for doing just that” :)
I don’t think this is as big of a problem as the article makes it out to be.
From a technical perspective it’s trivial to mirror all of the open source code on GitHub.
The hardest part would be getting the hardware to host it, but I’m not sure even that part would be too difficult.
Going from a single point of failure to a single untested backup is not a great technical solution. I think the answer is probably some kind of distributed hash table (which, yeah, is sort of the highest-level description of git, I know) that federates copies of the data, so that every use of the data produces more backups in the hands of the users who care.
Isn’t this – exactly what git already does? Every person interested enough to do a git clone <foo> is now a point in time backup of the entire history of <foo>.
I agree with the other commenters that GitHub is probably not going anywhere, and the basic information that GitHub hosts is trivially replicatable.
I think there is actually another issue though and that is the GitHub monoculture. Many people only know GitHub. I know people who barely understand that git and GitHub are not the same thing. On top of that, IMO, GitHub is not actually any good at collaborative programming (the PR, system, for example is very lacking). But some companies buy GHE because the people in charge of making those decisions don’t know any different and GHE is generally quite terrible for the requirements many companies have around code reviews. I actually don’t really enjoy contributing to most open source projects on GitHub because the actual act of collaboration I find so problematic.
What form of collaboration have you found more useful?
BitBucket and Stash much more heavily promote ownership and responsibility over codebases. PR’s have specific rewiewers. ACLs are down to the branch level, which encourages people to own the code. I’ve had so many PRs be /dev/null’s just because nobody wanted to take ownership over reviewing it on GH or GHE. I’ve had people commit to master without even asking the owners if it’s ok.
So the form of collaboration I like is very ownership and responsibility based.
Isn’t this the same problem we have with putting any code or content on any site that we don’t own ourselves? I think the best thing about GitHub becoming popular is that now days most people have their code on something that isn’t SourceForge. At least if your GitHub project is deleted you still have a complete source code backup with history. You may lost meta data such as a wiki, bug tracking and a small webpage, but you still have the most important thing for any project, the code. I’m also not sure why he is picking on Github, because it is the most popular? A single point of failure? I’d be more worried about Facebook, Google+, Gmail, Twitter, Flickr going under. Any of those things could be disastrous for many many people. Go warn people about those first. When the world is creating regular back ups of those services then we can start worry about services where the primary data is a backup of your local project folder.
Thanks to the nature of DVCS and the increased utilization of package managers, I’m not particularly concerned about widely-used source code disappearing. More concerning is all the bit rotting links which point to GitHub or any other website for that matter.
Exactly. Link rot is a serious concern, but it’s not as though it’s a problem being ignored, either - it was the original mission statement of archive.org.