Nice writeup. I’ve always heard that Git doesn’t handle files (with names), but handles obiects. How does that relate to this? Are file names just tags to an ‘object’, for which you change the tag on rename? And does committing make git resolve these names to these objects first?
(Simplified) Git has multiple kinds of objects, one is a blob of content, addressed by its hash, another is a tree which is a list of file names associated with a blob’s hash, and yet another is a commit which is a commit message, a tree addressed by its hash, and zero or more parents addressed by their hash.
These are all immutable, so you don’t change a tag, you create a new commit with a new tree and whose parent is the “previous” commit, and you make that your active commit (HEAD) which is again just addressing the commit object by its hash.
Renames are a function of presentation of the data, if you ask it to look at two trees (do a diff) and one has a file a and the other has a file b and they both point at the same blob (their contents have the same hash), git is going to infer that they were renamed (whether that’s what happened or not).
Oh hey does that mean that git deduplicates its storage of identical files for free? (Obvs not in the working tree, but in the .git directory.) Since they’ll have the same hash, it can just have the same blob referred to from multiple points in a single tree?
Git stores a directory as a list of (<name>, <hash>) pairs. The hash of that list is stored in the parent directory (along with the directory name).
When you edit app/foo.sh and commit, foo.sh gets a new hash. The listing for app includes this new hash. The root directory entry for app also gets a new hash by the same process.
Perhaps I shouldn’t pay it much mind, but I’m not a fan of the “juicy gossip” section:
After 25 years he finally stepped away from leadership on the Linux kernel in part, we’re told, precisely to address his pattern of vicious verbal abuse toward other technologists.
I find this a bit misleading as it doesn’t mention that it was only temporary.
On “renames don’t matter”: I think renames matter, but more importantly, so does Linus. … I found five commits exclusively or chiefly devoted to renaming a command or a module in git itself. If Linus really thought renames didn’t matter, he wouldn’t rename things.
It’s a reminder: what people say in this industry often won’t hold up to research.
I don’t think he was saying that renames themselves literally don’t have any use; he wassaying that tracking renames doesn’t matter to him. After all, he also said that “files don’t matter” and he created an entire kernel based on the concept that everything is a file. :P
I thought git mv was more explicit while renaming. I guess the automatic nature of log --follow and diff -M are helpful when you forget to mark a move. I do mark moves explicitly in mercurial either when doing it or after the fact with hg rename -A.
Which makes sense, incidentally, because Mercurial does track renames (and copies!) explicitly in the manifest. Git is the only SCM I’m aware of that deliberately takes an explicit design stance against tracking renames. (Lots of others, e.g. CVS, don’t track renames, but they don’t call that out as a feature.)
Nice writeup. I’ve always heard that Git doesn’t handle files (with names), but handles obiects. How does that relate to this? Are file names just tags to an ‘object’, for which you change the tag on rename? And does committing make git resolve these names to these objects first?
(Simplified) Git has multiple kinds of objects, one is a blob of content, addressed by its hash, another is a tree which is a list of file names associated with a blob’s hash, and yet another is a commit which is a commit message, a tree addressed by its hash, and zero or more parents addressed by their hash.
These are all immutable, so you don’t change a tag, you create a new commit with a new tree and whose parent is the “previous” commit, and you make that your active commit (HEAD) which is again just addressing the commit object by its hash.
Renames are a function of presentation of the data, if you ask it to look at two trees (do a diff) and one has a file a and the other has a file b and they both point at the same blob (their contents have the same hash), git is going to infer that they were renamed (whether that’s what happened or not).
Oh hey does that mean that git deduplicates its storage of identical files for free? (Obvs not in the working tree, but in the
.git
directory.) Since they’ll have the same hash, it can just have the same blob referred to from multiple points in a single tree?Yep.
This utility is quite nice to explore the underlying data structure:
see also
git cat-file
Git stores a directory as a list of (<name>, <hash>) pairs. The hash of that list is stored in the parent directory (along with the directory name).
When you edit
app/foo.sh
and commit,foo.sh
gets a new hash. The listing forapp
includes this new hash. The root directory entry forapp
also gets a new hash by the same process.Perhaps I shouldn’t pay it much mind, but I’m not a fan of the “juicy gossip” section:
I find this a bit misleading as it doesn’t mention that it was only temporary.
I don’t think he was saying that renames themselves literally don’t have any use; he was saying that tracking renames doesn’t matter to him. After all, he also said that “files don’t matter” and he created an entire kernel based on the concept that everything is a file. :P
I thought
git mv
was more explicit while renaming. I guess the automatic nature oflog --follow
anddiff -M
are helpful when you forget to mark a move. I do mark moves explicitly in mercurial either when doing it or after the fact withhg rename -A
.Which makes sense, incidentally, because Mercurial does track renames (and copies!) explicitly in the manifest. Git is the only SCM I’m aware of that deliberately takes an explicit design stance against tracking renames. (Lots of others, e.g. CVS, don’t track renames, but they don’t call that out as a feature.)