1. 10
  1.  

  2. 1

    I am kind of surprised that a blog post like this was necessary in the “git explained” genre of blog posts. The authors said that they could not find a complete description of the format anywhere, just broad descriptions here and there. Weird that it has taken this long for an alternate complete implementation. As far as I know, gitgo is the first independent implementation of git’s binary storage. Is this accurate? Or does dulwich implement packfiles in Python? Edit: yes it does. And the comments seem descriptive enough to me.

    Mercurial’s binary storage revlog format has been well-documented both by third parties, and by Matt Mackall, Mercurial’s originator. This has resulted in a couple independent implementations of the format, such as xrevlog in C or a Golang private implementation at Google. These are not new.

    1. 3

      Official git documentation explains packfiles in one sentence in git-repack(1) man page and at length in the book. So, no, it wasn’t necessary.

      GoGits have another native Go implementation, which seems older than Gitgo.

      1. 1

        Hm, I would not say “at length”. The actual details of the binary format are not explained in either of those references you listed. It doesn’t say, “at byte 0x0F you’ll find this and it goes on for 0xA0 bytes” or whatever. It just gives some git commands to get information about pack files, but not how to implement the binary format.

        1. 2

          Yes, agreed, it doesn’t explain much beyond the basics, such as delta compression. But at least this basic information is widely available, so the first paragraph of the OP (“I was astonished by the existence of pack files while writing a Git clone”) makes me wary of Gitgo.

          1. 1

            Yeah, that bit is indeed weird. But on the other hand, pack files are a subject that is widely handwaved in git-land, so I can see why some people might not be aware of their existence. People always say that git does not store deltas. Wellllllll, that’s sort of true, until some smartass brings up pack files. :-)

            1. 1

              #ifdef SMARTASS

              But it is true, git only ever stores objects! (Sometimes in form of deltas.)

              #endif

              Seriously though, above a certain (fairly low-level) layer deltas don’t exist, so even if git-gc has stored the exact delta you want to see, git-diff will still get raw objects and recalculate the diff between them. If I understand correctly.

      2. 3

        Git also has more elaborate documentation in the Documentation/technical directory of the git repository.

        Here it describes the pack format. And here it describes the pack protocol.

        1. 1

          Ah! So the original blog post does seem completely irrelevant now. They reverse engineered for nothing!

          1. 1

            I’d assume @chimeracoder read these docs and motivated them and made them easier to read, with pictures, which is a nice service, in my opinion. I didn’t see where he said that packfiles weren’t well described.

            1. 1

              I didn’t see where he said that packfiles weren’t well described.

              https://news.ycombinator.com/item?id=9773763

              1. 1

                Thanks for the pointer. Ahh, and indeed, he did read it: https://news.ycombinator.com/item?id=9773910

          2. 1

            Oh, thanks! I remembered that I’ve seen this, but couldn’t recall where.