1. 19
    1. 6

      When I first encountered Git, I saw that each commit was represented with a hash. Learning that the hash was derived from the contents of the repo at that commit, I came to think of the commit hashes as fingerprints used to access snapshots. The fingerprint metaphor was reinforced in my mind when I saw how common it was to reference a commit by only the first seven characters of the hash. For me, setting aside that this is more consistent with how Git internals actually work, the mental model of diffs derived from snapshots rather than the other way around made it easier to wrap my head around harder things like rebasing or dumpster-diving through the reflog for a lost commit.

      Git is complicated no matter how you look at it. Most developers I know semi-secretly hate it and learn just enough to get by. It was only when I worked on a team that was adamant about clean commit histories on the main branch that I learned how to do some of the harder chores and discovered to my surprise that I had a relative high pain tolerance for Git’s many sharp edges. (If only this were also true for 3D graphics programming…) Nevertheless, Julia Evans’ deep dive is greatly appreciated as always.

      1. 2

        the mental model of diffs derived from snapshots rather than the other way around made it easier to wrap my head around harder things

        I think this is a great observation. It is true that many Git operations treat commits as diffs from the first parent but teaching new users that commits are snapshots and how diffs are derived from these snapshots makes most of the mysteries vanish.

      2. 4

        “Yes”

        i.e. all three models are useful and correct at different times and in different contexts. For practical purposes, if I am working with git’s storage, commits are snapshots. However, git log presents them as diffs, but that’s a handy visualization, which happens to depend on the fact that commits know their history. And there are some awkward corners of git’s CLI where a commit hash is used to refer to a history rather than the commit itself. But when commits get separated from their repositories, such as during rebasing, they are converted to patches that will get reapplied elsewhere, so there are no snapshots nor any history.

        1. 3

          I might be weird but I think of it as a SHA-1 hash that has a relationship to other entities in a repository.

          1. 3

            I used git for 10 years almost every day before learning it actually stores snapshots instead of diffs. I don’t think this forbidden knowledge helped me at all in day-to-day tasks but it did make me lose some respect towards git. But maybe that’s the benefit; if you don’t mystify your tools you can drive them with confidence and not double-check every little thing.

            1. 5

              Why did it make you lose respect?

              1. 1

                It was cheating!

                1. 1

                  I think storing snapshots and then later using compression (although diff-optimized, in the pack files) is actually a great design, making the “logical” storage much simpler than other systems. And I think the compression is independent of file history, meaning renaming a file does not double the size, as it used to in (I think) CVS?

              2. 2

                It storing snapshots is why it’s so fast compared to e.g. SVN

              3. 3

                I always think of commits as snapshots and I think that mental model has some value, especially when CI/CD comes into the mix.

                A commit is an exact snapshot of the entire tree, so if you are fixing things, you need to make sure the entire thing is fixed, buildable, etc. Nominally this is “don’t break main/trunk/whatever”, but this is also so that you can use bisect, and have a useful history that actually describes the state of the tree, and any per-commit artifacts are usable.

                1. 1

                  I 100% agree that having an exact mental model of git is actually not that useful. Most of the time I think of commits as diffs, too, and the repo as a linked graph (like, a graph, implemented like a linked list, if that makes sense?), and it’s more than enough.

                  I kinda have a similar feeling about CPU/memory/disk. Most people writing code don’t need an actually accurate model of these things, a very simplified mental model can take you a long, long, way.

                  1. 1

                    When integrated into trunk, snapshots. When proposing changes, diffs/patches.

                    1. 1

                      I wonder how much of the “wrong” idea about git operation comes from how things used to be done and how that “knowledge” has been passed down (I’m old enough to remember the SCCS vs RCS debates)

                      1. 1

                        In terms of what I need them to be: Dependent changes: A work piece = a dependent set of changes. While working out what this set of changes has to be (because nobody is perfect the first time), you have to be able to work on the full stack of commits, not just the topmost commit. The fact that a commit has no stable commit ID while working on it or its dependent commits, or indeed has no concept of dependencies, is a shortcoming.

                        In terms of implementation: I call that equivalent: The same graph can be represented as nodes or vertices.