1. 38
    1. 23

      I like Git. A lot of people complain about how complicated it is, and it can be, but for most usage it is pretty simple. I think it’s the advanced things that people trip up on, because they can easily break your tree. Perhaps Git should not be providing people with all those guns pointed at their feet, or make it easier to do common tasks like change a recent commit message in a local tree without rebasing (which requires a clean tree and can do a lot more damage).

      However, I think this article conflates Git with GitHub too much. The linux kernel doesn’t use GitHub pull requests and probably merges as many outside patches as many projects on GitHub. From what I’ve seen as an outsider, contributors mostly just mail patches from git to lkml, and a group of developers maintain their own git trees on kernel.org to work on bigger things that are then rolled up and merged into linus’s tree.

      Yet I watch OpenBSD, an entire freaking operating system, get by just fine with CVS—CVS

      We get by with it, but mostly because there’s nothing better for our needs that is stable, permissively licensed, has no dependencies, and will be familiar to us. You can search the OpenBSD commit logs for “<expletive> cvs” to see how many times developers have run into problems with CVS. Also, we only ever do basic operations like checkout, diff, commit, import, and log. We don’t use branches, and Theo is the only one that ever tags stuff (and when he does he has to alert everyone to not touch the tree while it’s running).

      As someone that is (very slowly) working on the guts of OpenBSD’s CVS repositories, I would love to be able to switch the project to something better. Perhaps a very basic, permissively licensed, blob-compatible implementation of git without all the bells and whistles. Just enough to clone, diff, push, pull, log, etc. without all the rebasing and modules and stuff. That way OpenBSD developers can use this basic tool to do OpenBSD stuff, and outsiders can use the official Git clients to clone our trees and send us patches because the backend store will be compatible.

      One big advantage for OpenBSD with a Git/blob-style VCS would be that it’s forward-only. It would be great to verify the tree once and know that history can never be rewritten after that point without everyone else noticing. Right now with CVS, it’s trivial to insert a hidden change into a ,v file that nobody would easily detect. Also, we often have developers working on conflicting areas of the tree and being able to maintain their own repos somewhere would be nice. It would be easier to let others pull a developer’s tree whenever they want instead of nagging the developer to mail out a new patch which probably has other, unrelated stuff in it.

      1. 6

        change a recent commit message in a local tree without rebasing

        Please note that this is possible without rebasing. The command to do it is git commit --amend.

        1. 4

          That changes only the latest commit. Going further back requires a rebase. That said, I’ve always found the behavior of the interactive rebase to be straightforward and easy to grok; the biggest concern is if you rewrite more than you need to and then try to reconcile your history with someone else’s.

          1. 6

            That changes only the latest commit. Going further back requires a rebase.

            I really think this is a weird artifact of the git UI, that all commit edits are called rebases even when the commits you’re editing aren’t changing what they’re based on. Git has made it so that “rebase” is how we say “edit”.

            In Mercurial, hg rebase changes the base and nothing more. If you want to edit your commits you use hg histedit if there’s more than one, or hg amend to do them one at a time (and you can use hg amend on any commit, not just a head).

          2. 1

            Good point. Though I also think that it’s possible to edit the messages with git filter-branch (but it’s probably not interactive). Personally I’ve never used filter-branch for this purpose. I had to change the author a few times because I commited with my personal e-mail.

        2. 2

          The same command works in Mercurial, hg commit --amend, or if you’re using Evolve, it’s just hg amend.

      2. 2

        If you have a sec, could you explain why the license of git makes things difficult for you? git has a GPL/LGPL which should let you use it, and surely the license of the VCS does not affect any code you version control with it? Thanks!

        1. 5

          Yeah it’s nothing like that, we just have a goal of not adding any new GPL licensed software. CVS is one of the few remaining packages we have in base that is GPL’d.

          1. 1

            I mean, this is a bit grody, but JGit is under the ECL (BSD-like), so if GCC isn’t already forbidden, it ought to be possible to use that as a base for a new Git compatible SCM. But probably going the Fossil route or something would make more sense.

            1. 2

              But JGit’s Wikipedia page has been deleted! Next up: gjit, a high performance jit engine for git.

              I don’t think Fossil is going to happen. I have some ideas of my own, but they look a lot more like hg than git, especially the internals.

              1. 3

                Implementing RevlogNG is honestly not even an afternoon’s worth of work, and I don’t it’d be that hard to build up the manifest/changeset bits as well—even in pure C. At that point, you could run in whatever direction you wanted to build up a new SCM that’d be very Mercurial/BitKeeper-like in operation. I was going to do that a bazillion moons ago, but ended up doing Harmony instead.

                1. 2

                  Implementing RevlogNG is honestly not even an afternoon’s worth of work, and I don’t it’d be that hard to build up the manifest/changeset bits as well—even in pure C

                  Greg Ward has done that.

    2. 7

      This should really have a ‘rant’ tag. Much of it is subjective. Some of it is misinformed.

      1. 7

        Can you give me an example of something that’s misinformed? I’d be happy to take corrections.

    3. 6

      Although not the point of the post, if you wish to use an email-based flow.You can use git format-patch and apply them with git apply. There are even some commands like git-am (which I haven’t used) that integrate directly with mailboxes.

      One thing that github is increasingly making easier is contributing without leaving your browser. For fixing docs I just edit the file online. But I agree the github’s pull request is a little bit heavy handed. Another thing I dislike of the PR’s branch is that only the submitter has write access to it. I can check it out and modify it but I can’t push it back to the pr branch in order to ping pong with the author of the PR. Also when rebasing the PR comments the code one commented on just disappears. I’d prefer it if more projects used an email based flow.

    4. 5

      After dealing with centralized source control for many years, I’m really glad for distributed. Once you grok the idea that git is a tree, that nodes are commits, and that branches are paths, then git becomes nothing more than a bunch of tree operations. What’s even better, it’s a functional tree. All nodes are quickly indexed by a hash. Nothing is ever truly lost if you have the hash (well, unless you decide to garbage collect those with no more reference).

      As a programmer, nothing is more natural than a simple data structure. I’m so glad that we are moving away from the Object Oriented model of MKS and the branch-is-not-a-branch model of SVN.

      I’m also really glad that it is decentralized because then i can muck around in my own branches and revise history and stuff while i’m busy hacking a feature. Gerrit and other code review tools also really benefit from this distributed model. Gerrit has pushes that go to a generated branch for review. It groups revisions of a patch together by a Change-ID in the commit. You can push/pull/fetch/cherry pick to your hearts content these gerrit branches just as any other. This model is so much better than visual studio, mks, svn, cvs, or whatnot that i’ve had to work with over the years.

    5. 5

      When I read this article recommending using centralized VCSes for most cases, I felt like it was missing some big advantage of DVCSes. After rereading the article, I found what it misses. The article says that most developers don’t use the distributed feature of Git:

      Of all the time I have ever used DVCSes, over the last twenty years if we count Smalltalk changesets and twelve or so if you don’t, I have wanted to have the full history while offline a grand total of maybe about six times. And this is merely going down over time as bandwidth gets ever more readily available. If you work as a field tech, or on a space station, or in a submarine or something, then okay, sure, this is a huge feature for you. But for the rest of us, I am going to assert that this is not the use case you need to optimize for when choosing a VCS. And don’t even get me started on how many developers seem to assume that they can’t get work done if GitHub goes down. Suffice it to say that I think most developers are pretty fundamentally unaware of how to use their DVCS in a distributed manner in the first place.

      I certainly can’t speak for “most developers”, but I personally use Git offline, without GitHub, often. I love being able to make a commit without that commit automatically being published, forcing me to worry about the wording of the commit message and whether the commit includes all of the changes. Before I publish a sequence of commits, I can review them if I feel unsure of their quality, and edit/rebase them if I missed something. I also occasionally commit on feature branches, to back up work in progress that is not yet ready to be published, or to temporarily undo changes so I can run tests on an earlier version of history.

      The author, as a knowledgable DVCS user, says that he rarely uses the ability to view history offline. Yet every time you view history in Git, you are viewing it offline. The author really just means that he is rarely without Internet access. However, it is not true that offline history viewing is only useful when you have no Internet access. It has another advantage: speed.

      I remember that when I used SVN at one company I worked for, it took annoyingly long to view various aspects of history (though I don’t remember exactly in what ways I viewed history). Now I use Git for my projects, I do not have significant pauses when I view history – and I view history of various repositories multiple times a week. I suspect that if I were using SVN nowadays, I would have those annoying pauses again, especially because my home internet connection is not very fast.

      I only remember with certainty having pauses when viewing history, but SVN might also have annoying pauses when creating a commit or checking out a branch. Git does not have those pauses, because it does not have to use the Internet to perform those operations, because it is a DVCS.

      This article’s recommendation to use a VCS doesn’t apply to me, because I actually use the decentralized features. And I suspect that this a contributor to some negative reactions to this article on Lobsters. A reader of Lobsters is more likely to know how to use the decentralized features of Git, so they are more likely to object to the argument in this article.

      1. 1

        I love being able to make a commit without that commit automatically being published, forcing me to worry about the wording of the commit message and whether the commit includes all of the changes. Before I publish a sequence of commits, I can review them if I feel unsure of their quality, and edit/rebase them if I missed something. I also occasionally commit on feature branches, to back up work in progress that is not yet ready to be published, or to temporarily undo changes so I can run tests on an earlier version of history.

        This. At work we use svn, I checkout with svn, commit with git to the same directory locally then when I am ready I commit my changes back to svn. This allows me to work on a feature and get full history locally and I can even commit locally if the server goes down (Not a problem for me, but we have a lot of out sourcers that have poor connections so they benefit from doing this too).

    6. 3

      Are there any resources to point to for managing a single company-wide repository?

    7. 3

      Minor footnote correction: cvs does not, as far as I know, keep an unmodified copy of the file anywhere. If you want to revert, you need to hit the server.

    8. 3

      Facebook and Google are putting in a tremendous amount of effort to make Mercurial scale to gargantuan code bases with ease

      Hmm, I think its only FB. I’m not aware of anything that Google is working on that’s related to this. Does anyone know anything about this?

      1. 6

        Yeah, I hang out with the hg devs (and I have the teeshirt to prove it!).

        durin42 is the big Google guy using hg, and he’s also in charge of parts of code.google.com. He’s grown the hg Google team, as you can see by the commit logs, and they’re replicating Facebook’s move of scaling git by replacing it with hg. I don’t think they have a PR announcement yet, but the amount of work going into Mercurial from @google.com emails should be evident.

        Big companies are seeing the benefit of having gigantic repos and a DVCS. I have heard gossip of trying to court other big companies with replacing their gargantuan and slow git repos with gargantuan and faster hg repos.

        1. 1

          Are you saying that Google is planning to replace its Perforce deployment with a Mercurial deployment similar to Facebook?

          1. 1

            I don’t know. I have heard gossip that this could happen or something like it. I thought they were about to make the announcement.

            1. 1

              Ok. We’ll see :)

      2. 4

        Google uses a modified perforce with an optional git frontend.

        1. 2

          The git frontend is really fragile though. =(

    9. 3

      DVCS? Centralised?

      Why not both?

      That’s basically how Facebook and Google are handling Mercurial.

      1. 5

        That’s my point at the end, where I linked to remotefilelog and Facebook’s announcement, and noted that Facebook and Google are keeping Mercurial for the branching, but surrendering a lot of the distributed parts.

    10. 3

      Using something like Github doesn’t remove the distributed nation of git at all… I think it is the people who say that who are the ones that don’t understand the distributed nature of git.

      1. 1

        Exactly. Git is exactly how distributed you want it to be. Use it like svn/cvs, use it like mecurial. Different computers, users and operations can all use it differently even on the same repository.

    11. 2

      Hey hey hey, watch what you say about darcs! some of of still use it every day!

      1. 2

        I really love Darcs, but, as far as I can tell, development has largely stalled. Has the situation changed? I know there was a project that was kind of a Darcs 3.0 that was going to resolve some of the exponentially bad “theoretical” performance, but I don’t know what happened to it.

        1. 1

          Last release was last August. But development seems to be on going judging from the developer mailing list.

          Most performance issues have been dealt with too. I haven’t had any exponentially bad performance for a long time using it.

      2. 1

        I love Darcs and would love to use it day to day. I find the model much more understandable.

        I would give a lot for being able to use it to manage git-repos ;).

        Also, the CLI is top-notch.

        1. 1

          I use darcs for Fire★ and use a script to convert my darcs commits to git commits. If you look at my commits, you will see the darcs hash.

          1. 2

            Isn’t darcs-to-git just one way? How do you handle contributions by others on the GIT repos?

            1. 1

              HAHAHA, how nice of you to assume other people contribute to my projects!! In all seriousness, from the few contributions I got (and they were great contributions, not to down play them), I did something very simple. I pulled from git and did ‘darcs record’, then pushed to my darcs repo. I would write in my message something silly like “Got a contribution from a git, to git, for a git”, and push that.

              1. 1

                :D I hoped for a workflow that would maybe import them as Darcs patches :). Thanks.

                1. 1

                  The Darcs Life isn’t always a glamorous one.

    12. 2

      over the last twenty years if we count Smalltalk changesets and twelve or so if you don’t, I have wanted to have the full history while offline a grand total of maybe about six times. And this is merely going down over time as bandwidth gets ever more readily available. If you work as a field tech, or on a space station, or in a submarine or something, then okay, sure, this is a huge feature for you. But for the rest of us, I am going to assert that this is not the use case you need to optimize for when choosing a VCS.

      Maybe I rarely use local history directly. But I absolutely do use the ability to commit instantly. If I had to go back to doing a network roundtrip every time I wanted to commit (often a second or more over a corporate VPN), I’d go insane.

      Maybe you could avoid that in a centralized system, but not without introducing an extra level of staging, along with confusion in conflict resolution. That seems to go against the spirit of the article.

      These are large, opaque files that, while not code, are nevertheless an integral part of your program, and need to be versioned alongside the code if you want to have a meaningful representation of what it takes to build your program at any given point.

      You know what else meets that description? Libraries. Most of us (other than the crazy go folk) like to depend on released binary versions of libraries, and don’t like to check them into source code. So you already need infrastructure for handling this kind of thing (e.g. Nexus).

      Binary files aren’t diffable or mergeable. Even SVN feels the need to mark them as something different and special and handle them differently. Better to have two tools than one tool with two different modes.

      This is stupid. This is like saying that chopping off your arms is good because it forces you to get really good at tying your shoes with your teeth. As much as I am a big fan of the Zen saying about the sound of no hands clapping,10 this argument is specious at best, justifying why a weakness is acceptable by claiming it’s superior.

      In theory you had the same options with other systems. In practice I remember months of begging the mighty sysadmins to add a new subversion repository because project x was far too big for a single repository and needed to be split into three different projects, and getting nowhere. Even thinking about it five years on I’m getting angry.

      People are lazy. Unlimited freedom is not an advantage. Any half-decent cyclist knows the optimal position for their feet on the pedals, and could carefully keep them in the right position all the time. Any truly serious cyclist physically attaches them so they don’t have to.

      Here’s a tip: if you say something’s hard, and everyone starts screaming at you—sometimes literally—that it’s easy, then it’s really hard. The people yelling at you are trying desperately to pretend like it was easy so they don’t feel like an idiot for how long it took them to figure things out. This in turn makes you feel like an idiot for taking so long to grasp the “easy” concept, so you happily pay it forward, and we come to one of the two great Emperor Has No Clothes moments in computing.

      Eh, maybe. I don’t remember ever being confused. Most importantly, I do remember losing work to svn fuckups, and cvs fuckups, but I don’t remember ever having a git fuckup. I’ve seen colleagues screw up merges, but never irretrievably - nothing like the time when we discovered we had six months' worth of svn merge properties attached to the wrong directories and it was easier to make a new repository than persuade the sysadmins to run some kind of script to fix it.

      Maybe I’m an apologist or whatever, but git has been easier to not screw up than cvs or svn, and its speed makes it so much more pleasant to use than mercurial or monotone.

      Prior to GitHub, to send a patch to a project, you needed to

      The old workflow was much more manual; it’s only by listing a bunch of different steps at different granularity that the author makes it sound otherwise. If we write one step for each click in the UI, it’s more like:

      1. Download source code
      2. Unpack source code
      3. Edit source
      4. Run diff
      5. Read the diff file (this is another click mostly because email client UIs are terrible but that’s another argument)
      6. Open email program
      7. Find project mailing list
      8. Struggle with the mailman UI until you figure out it’s the other mailing list
      9. Copy/past mailing list address into email client
      10. Send email
      11. Mailman signup process
      12. Wait for list moderator to approve your email
      13. Create a spam rule to get rid of all the messages you’ve now got because you’re on the project mailing list
      14. Watch your patch be ignored

      New workflow is more like:

      1. Clone project in github
      2. Checkout/open project locally (1-click process)
      3. Make change
      4. Commit/push change
      5. Submit pull request (you can review it as part of this because github knows what a pull request is rather than thinking it’s an arbitrary file attachment)
      6. Watch it be ignored, confident that you’ll get a sensible form of notification if it ever isn’t

      And this extra complication doesn’t really get you anything. You still frequently need to rebase patches when they don’t merge cleanly, just like you used to have to tinker with patch fuzz factors.

      Not at all the same thing. With git you do a proper merge with all the merge tooling. With patches you guess and hope and pray that your changes aren’t silently destroyed.

      In fact, the only actual advantage I can see is that you get your name explicitly in the commit history instead of in a THANKS or a Changelog or a README. I mean, good job if that’s what you want, but maybe admit that it’s about vanity and not about tooling.

      It means you get proper commit history in the repo history, not one huge patch chunk. If you’re using a VCS at all it’s presumably because you think that kind of thing is important.

      After all, if what you want is simplicity, you don’t need any of this VCS nonsense; just edit your live PHP files on the production server.

    13. 3

      tl;dr:

      • I don’t know that I can use --depth while cloning
      • git is bad because github is bad and git == github
      • git-blame is slow on big repos
      1. 23

        I’m the author. I think you’re oversimplifying my article rather dramatically because it’s trying to paint in grey and you want black and white. My point isn’t at all that Git is bad at those things, but rather that DVCSes involve trade-offs, that those trade-offs are not appropriate to every project, and that we should evaluate DVCSes based on the trade-offs involved. But to respond to your specific points:

        Using git clone --depth solves exactly one issue: it can trim the history for you. As of recent versions of Git, you can still even commit and push. In exchange, you give up the ability to see the full history, period; you give up the ability to meaningfully use git blame, period; and you’ll still have local size explosion if you have a lot of churn of media, as is common in many games and multimedia apps. This all leave you strictly worse off than a centralized system.

        You also have no support in Git for narrow clones, which is grabbing the full history, but for only part of the repository. Many centralized systems can do this (e.g. Subversion, Perforce), and Mercurial will likely be able to soonish via a third-party extension called narrowhg, but Git currently cannot. It likely could be extended to do so, but if we play that game, I’ll point out that Subversion is getting first-class tags and offline commits any day now. (For real; it’s been on the roadmap since the late aughts.)

        I wasn’t claiming that Git equaled GitHub, but rather that people use Git in a centralized manner, as evidenced by the fact that most developers I’ve worked with–apparently not including you–do not actually know how to use Git in a distributed manner, and therefore allow their work to come to a stop when GitHub/Kiln/Bitbucket/Stash/Gerrit/whatever goes down. This isn’t even Git-specific; I’ve seen the same behavior in shops based on Mercurial and other DVCSes. I therefore was pointing out that we, as a developer community, aren’t really using the distributed part of DVCSes, but rather using them in a centralized manner because Subversion, and Subversion in particular, got branching wrong. That’s a silly reason to go to DVCSes.

        1. 9

          All your criticism about binary media and histories has a flip side too.

          I have a repository with >1600 commits, containing the history of a daily SQL dump of a production database (~4 years).

          The SQL dump started out at ~30MB and has at this point grown to ~80MB.

          The .git directory of a fresh clone of the backups repository is a mere ~30MB – even though it contains 1600 revisions of multiple dozen MB worth of content.

          If you put diffable content into Git, you need to get into some seriously crazy numbers to outwit it.

          That’s a hard thing to give up in return for a trade-off of some kind. Seems like that’s at least part of the reason why people try to force Git into doing things it isn’t naturally suited to.


          Now for the speculative part of my comment: I wonder if some sort of integration of bup or at least its algorithm couldn’t be used to do media even with a vanilla Git repository. At least, its overview boasts the following:

          • It uses a rolling checksum algorithm (similar to rsync) to split large files into chunks. The most useful result of this is you can backup huge virtual machine (VM) disk images, databases, and XML files incrementally, even though they’re typically all in one huge file, and not use tons of disk space for multiple versions.

          • It uses the packfile format from git (the open source version control system), so you can access the stored data even if you don’t like bup’s user interface.

          • Unlike git, it writes packfiles directly (instead of having a separate garbage collection / repacking stage) so it’s fast even with gratuitously huge amounts of data. bup’s improved index formats also allow you to track far more filenames than git (millions) and keep track of far more objects (hundreds or thousands of gigabytes).

          That seems like it would not merely address the complaints about binary media (and partially address some of the other points of criticism you brought up), it would actually extend the crazy benefits of Git for diffable content to binary media – pushing it into a realm beyond even traditionally good-at-this centralised VCSs.

        2. 4

          @gecko, thanks for your article. This is an interesting piece!

          According to you, is there any chance Subversion fixes its branching model? Do you think that Subversion could at one point solve the issues detailed in your post better than Mercurial with the Facebook extensions?

          I could not agree more with the section “Say goodbye to sane version synchronization” of your post. I think the amount of efforts we put in synchronizing dependencies with tools like pip/npm/bundler/etc. is insane considering that we all use version control systems that are precisely supposed to do that. We should just commit everything in one single company-wide repository, instead of handling many different repositories. You are the first person I see writing about this outside of Google and Facebook, and that’s refreshing :) What VCS would you use today for a single company-wide repository?

        3. 2

          (about --depth)

          Fair enough, but how could one fix the local size explosion problem with a VCS that is based on a crypto-checked DAG? In a way that isn’t git-annex-like? (From quick peek, hg’s one looks almost the same on the conceptual level).

          (about narrow clones)

          Yeah, again, in git whole repo is in the dag as a one atom, which I think you (and I) applaud. Narrow clones would equal breaking this dag, wouldn’t it?

          (about non-D practices with D systems)

          Well, I for one, think that world salvation isn’t achievable w/o extra-natural means, and if someone doesn’t know better and doesn’t want to know, let him be. Maybe SVN with sane branches would really be a good idea. (or just a different, centralization-first porcelain for git or hg, even better).

          1. 5

            (about –depth)

            Fair enough, but how could one fix the local size explosion problem with a VCS that is based on a crypto-checked DAG? In a way that isn’t git-annex-like? (From quick peek, hg’s one looks almost the same on the conceptual level).

            You clone the “whole” repo, and page in as required, and/or have a richer client protocol. Mercurial’s currently going kind of half-and-half with things like remotefilelog (linked in the article), which stores the exploded file history in a memcached instance and caches locally, but that’s not the only or necessarily best way to approach the problem.

            (about narrow clones)

            Yeah, again, in git whole repo is in the dag as a one atom, which I think you (and I) applaud. Narrow clones would equal breaking this dag, wouldn’t it?

            It doesn’t have to. From experience trying to optimize caching on the Kiln-on-Demand servers, most commits only touch a narrow part of the repository. In Git terms, this means that your commit is statistically unlikely to collide with tree edits from someone else working in a different part of the code base, which in turn means that your “merge” can actually be massaged out at push time. This does require a richer protocol, and is theoretically worse (since Git would make you do a no-op merge here where a theoretical omniscient merge tool would notice cross-file conflicts), but in practice, since no tool does this, it’s not actually a regression.

            (about non-D practices with D systems)

            Well, I for one, think that world salvation isn’t achievable w/o extra-natural means, and if someone doesn’t know better and doesn’t want to know, let him be. Maybe SVN with sane branches would really be a good idea. (or just a different, centralization-first porcelain for git or hg, even better).

            My point at the end, albeit subtle, is that Mercurial with extensions is probably heading where we want to go, but my goal was merely to start the conversation. I think too many people take DVCSes as the be-all end-all, and they simply aren’t. I want us developers to think about and work in the context of engineering trade-offs.

      2. 10

        That’s not at all the point.

        I don’t know that I can use –depth while cloning

        I think there was even something in there about people who say “git’s not hard; you just don’t know how to use it.”