1. 22
  1. 9

    I’ve always suspected that, behind git’s convoluted “documentation” and opaque design, there’s a good VCS that just sits there in the shadows and just won’t come out. I think this is a pretty good example of that, because even though commits are indeed snapshots, not diffs, git goes out of its way to make it hard for you to work with snapshots, even when you really want the snapshots, not the damn diffs!

    I don’t miss Clearcase-the-VCS, thank God that thing is now confined to museums, but there are a few extremely useful features from it that I miss. For example, because it exposed snapshots, not just diffs, you could instantly access any version of a file: the whole tree was mounted in a virtual filesystem and you could access previous versions just like you accessed the current version. Sort of, but not quite, similar to what the git9 implementation of git for Plan 9 does.

    The overall situation was reversed, Clearcase wasn’t a good VCS IMHO but it was probably a good foundation on which to build a good one.

    1. 6

      Some things worth mentioning:

      • Git was meant to be a library, and does pretty good job at it. The CLI UI though was stitched together as an example UI and (almost) no-one bothered with creating something better
      • Using VFS is an UI thing, and there are Git based VFSes, one created by MS
      • You still can access any version of file via git show <ref>:<path>
      1. 4

        git goes out of its way to make it hard for you to work with snapshots

        confused by this. git checkout $refspec lets you check out arbitrary snapshots of the whole repo pretty easily, git checkout $refspec $filepath lets you check out a particular version of a particular file. In what way is this failing to meet your needs?

        1. 2

          confused by this. git checkout $refspec lets you check out arbitrary snapshots of the whole repo pretty easily, git checkout $refspec $filepath lets you check out a particular version of a particular file.

          On the other hand, with e.g. Clearcase, I could:

          • C-x C-f (open :-) /home/x64k/someproject/somefile@@revision and see a particular version of a particular file, without having to checkout anything – i.e. without messing up my working copy, without git complaining about things in the staging area and so on
          • I could also e.g. diff between two revisions of a file by just pointing at the right path (git eventually got the ability to do diffs between arbitrary revisions but pretty late)
          • With some additional tooling, I could just cd /home/x64k/someproject@@revision and work on it as if I had it checked out – also without messing up my working copy

          It’s not “failing to meet my needs”, it just meets them the way a rock meets my nail-hammering needs :-)

          1. 1

            huh, I wonder how that worked. Virtual filesystem? Shell extension? TortoiseSVN was a shell extension for subversion that I thought was really useful when I used it, but for git I just use the cli.

            1. 1

              It worked via a virtual filesystem, which was all the more impressive when you think it predates good, portable virtual filesystem tools like fuse. It’s mind-boggling that they managed it. That also took a toll on how bug-free it was.

              That being said, virtually all the places where I’ve seen it used had extensive tooling around it. The last point on my list (cd-ing into a particular revision) didn’t happen out of the box. This one’s a little hard to explain, mostly because I honestly forgot most of the details, but also because the terminology has changed a lot. But tl;dr a lot of things were done differently.

              E.g. IIRC, out of the box, it didn’t really have the concept of a “snapshot”, as used by this article, or to put it another way, “commits” were per-file, and you didn’t know which version of one file went with which version of another file. (The beneficial side-effect was that, in many cases, the answer was always “the latest”, and there were no lingering, weird branches). You were expected to come up with your own solution if you needed that, or, realistically, to shed out big bucks for a consultant who did it.

              I definitely wouldn’t go back to that, they’ll pry git out of my cold, dead fingers at this point (I do wish we’d gone with Mercurial instead but that ship has sailed). But I also miss some things that tools other than git did well, and I do think that most of us are just too distracted getting their PhD in advanced gitology to advance things.

      2. 4

        A good explanation of how Git works, but that’s a design bug in Git. Commits should be diffs, that’s why it is confusing.

        1. 4


          1. 3

            It makes cherry picking and rebasing easy, and you don’t need to write articles like “commits are snapshots, not diffs”.

            1. 11

              I don’t see how it makes those easy. Calculating a diff from snapshots (as we do now) is trivial; the hard part of cherry-picking and rebasing is resolving conflicts when the context for the resulting diff isn’t the same anymore. Storing diffs instead of snapshots doesn’t make that problem go away.

              Furthermore, the patch format is lousy at storing some changes like file moves or deletions or mode changes. A more efficient patch format might manage that better.

              1. 2

                It does make cherry picking easy, because commits modeled as diffs keep identity. In Git, the exact same commit gets different identifiers when you cherry pick from branch A to branch B. This is a cause of large number of problems.

                https://pijul.org/faq explains the issue in more details.

                1. 1

                  I could not extract a problem from that web page that explains what you mean. Can you try summarising here? Can you give a practical example of the problem?

                  1. 4
                    • 3-way merge doesn’t always behave “as expected”: there is an example on https://pijul.org/manual/why_pijul.html where the result (as in, the contents of the files) of a Git merge depends on whether you merge two commits one at a time, or just the last one (i.e. both diffs together).
                    • Conflicts are not modeled at all in Git, and Git has git rerere to handle that. Patches make that intuitive.
            2. 3

              Because it just makes sense given the operations we do. Conmits are patches we use to take the code from version x to y. That’s why we can rebase and mail patches around. They’re patches. Which is like a diff but the input/active view while a diff could also be the passive/output/observation view.

            3. 4

              If you want diff, you should look into old vcs. SVN and CVS both stored diffs. (troll intended)

            4. 4

              I read this before seeing it here, and came to the conclusion: It’s all irrelevant.

              In SVN commits are diffs. From the end user’s standpoint, the difference is semantics. Snapshots vs diffs is technical esoterica that makes no actual difference to daily usage.

              Now having said that, if thinking of commits (in either git or SVN) as snapshots instead of diffs makes it easier for someone to reason about, there’s no harm there. But if thinking about commits as colored dots on a piece of (digital) graph paper helps, do that, too. Use whatever model helps.

              1. 2

                Well, storing patches has some advantages over storing snapshots, Pijul explain it pretty well in article linked above. On the other hand some other actions are easier to do when working with snapshots. It is not just “technical esoterica that makes no difference to daily usage”.

                1. 3

                  Those problems, as real as they are, are not ones that I’ve hit in more than a decade of heavy, daily git usage (and I used darcs before that)

                  1. 1

                    “storing patches has some advantages over storing snapshots” – Yes, of course. These advantages just make no difference to the overwhelmingly vast majority of people using git.

                    To your every day git user, it makes no difference whether git stores snapshots, diffs, or ice cream sandwiches. As long as it faithfully reproduces the code you store in it.

                2. 1

                  Dual nature, surely? Commits are stored as snapshots; they can act as snapshots (“please checkout version 123 of file X”); and they can act as diffs (“please rebase [the changes I made in] commit 123 onto commit 125”).

                  There’s more terms in Git that have a dual nature. For example ‘branch B’ can mean both all four of “the branch pointer B”, “the commit that B currently points at”, “that commit and all its ancestors”, or “all commits that are only ancestors of B, and are not also ancestors of another branch pointer.”

                  1. 1

                    I like this article a lot for intuition. There are several topics here I stopped short of learning years ago, but they’re easy to grasp.