1. 28
  1. 4

    Nice writeup. I’ve always heard that Git doesn’t handle files (with names), but handles obiects. How does that relate to this? Are file names just tags to an ‘object’, for which you change the tag on rename? And does committing make git resolve these names to these objects first?

    1. 11

      (Simplified) Git has multiple kinds of objects, one is a blob of content, addressed by its hash, another is a tree which is a list of file names associated with a blob’s hash, and yet another is a commit which is a commit message, a tree addressed by its hash, and zero or more parents addressed by their hash.

      These are all immutable, so you don’t change a tag, you create a new commit with a new tree and whose parent is the “previous” commit, and you make that your active commit (HEAD) which is again just addressing the commit object by its hash.

      Renames are a function of presentation of the data, if you ask it to look at two trees (do a diff) and one has a file a and the other has a file b and they both point at the same blob (their contents have the same hash), git is going to infer that they were renamed (whether that’s what happened or not).

      1. 3

        Oh hey does that mean that git deduplicates its storage of identical files for free? (Obvs not in the working tree, but in the .git directory.) Since they’ll have the same hash, it can just have the same blob referred to from multiple points in a single tree?

        1. 2

          Yep.

      2. 3

        This utility is quite nice to explore the underlying data structure:

        $ git ls-tree HEAD
        <snip>
        100644 blob 5caf2e89168505c24ad1e3146fd029929f27487a	main.go
        040000 tree d0357c0f78bab0bd5dbb19f7d805bcb987ce74a6	man
        040000 tree 1ce4d49aa464dfdfe0314b0937e2a203dacdc96e	nix
        100644 blob 0959aae462cbec0d6e1cd1d7691f1262350989ee	rc.go
        <snip>
        $ git ls-tree HEAD man/
        100644 blob b5b49633b7fe4cb364b476ad7255575e4e515765	man/direnv-stdlib.1
        100644 blob 57ff9cb23b73219eeac2317c2d4f52ed0cdbaf59	man/direnv-stdlib.1.md
        100644 blob b4a2fa2e806593c80dfbf5b0ad325303635ca74a	man/direnv.1
        100644 blob e180e462681bf41c458c47e85470cd2e882c3899	man/direnv.1.md
        100644 blob 763d8b9e0383ca9f2ae6d1433aaafbad1753f406	man/direnv.toml.1
        100644 blob 1487278964fd7d98c1200c01cbd020ab0953647e	man/direnv.toml.1.md
        

        see also git cat-file

        1. 1

          Git stores a directory as a list of (<name>, <hash>) pairs. The hash of that list is stored in the parent directory (along with the directory name).

          When you edit app/foo.sh and commit, foo.sh gets a new hash. The listing for app includes this new hash. The root directory entry for app also gets a new hash by the same process.

        2. 3

          Perhaps I shouldn’t pay it much mind, but I’m not a fan of the “juicy gossip” section:

          After 25 years he finally stepped away from leadership on the Linux kernel in part, we’re told, precisely to address his pattern of vicious verbal abuse toward other technologists.

          I find this a bit misleading as it doesn’t mention that it was only temporary.

          On “renames don’t matter”: I think renames matter, but more importantly, so does Linus. … I found five commits exclusively or chiefly devoted to renaming a command or a module in git itself. If Linus really thought renames didn’t matter, he wouldn’t rename things.

          It’s a reminder: what people say in this industry often won’t hold up to research.

          I don’t think he was saying that renames themselves literally don’t have any use; he was saying that tracking renames doesn’t matter to him. After all, he also said that “files don’t matter” and he created an entire kernel based on the concept that everything is a file. :P

          1. 3

            I thought git mv was more explicit while renaming. I guess the automatic nature of log --follow and diff -M are helpful when you forget to mark a move. I do mark moves explicitly in mercurial either when doing it or after the fact with hg rename -A.

            1. 4

              Which makes sense, incidentally, because Mercurial does track renames (and copies!) explicitly in the manifest. Git is the only SCM I’m aware of that deliberately takes an explicit design stance against tracking renames. (Lots of others, e.g. CVS, don’t track renames, but they don’t call that out as a feature.)