PSA: if you get hit by this, take the rest of the day off. This is a huge ecosystem-breaking thing and stressing about it will not help anyone. Take the rest of the day off and go do fun things like playing video games with your kids.
More worryingly, the Kubernetes tarballs once changed due to a new commit in the repository. Their source included a stamped “short commit” reference, where GitHub would inject the shortest unique string for the tagged commit. A new commit was added that made the shortest unique string less unique, so the generated tarballs changed their file content (not just tar/compression settings).
If you want a stable checksum, dynamically generated tar.gz files are not the way to go. Ask the maintainers to upload release tarballs, or do your own mirroring of your dependencies (you should be doing this anyway).
The FreeBSD ports system’s integration with GitHub lets you specify a commit for the port to depend on. This is useful because you sometimes have release branches carrying important fixes and it’s simpler to point at them than carry those fixes locally. It is also designed to allow offline builds and so has a separate fetch phase from the build phase. The fetch phase doesn’t depend on the existence of tools like git, so needs to fetch tarballs. The GitHub dynamic archive generation interface is great for this.
That said, distfiles are also mirrored on project infrastructure, so if the GitHub upstream version fails the checksum then the builders should be able to grab the version from the mirror. Dealing with a new archive generation tool is probably not too much of a problem, just a bit annoying.
Makes you wonder how something like this got through with no prior announcement or communication of any kind it seems. Surely they knew they’d have to revert this?
From the link, it looks as if this is a change in git, and GitHub’s API is just a thin wrapper around the git tool. The problem isn’t that git changed it, it’s the combination of three things:
The behaviour of the git tool changed.
Various services such as GitHub exposed the functionality of git directly.
Various consumers of that service depended on the GitHub service to provide a stable output.
I’m honestly quite surprised that this is how it works at GitHub, but then I wasn’t aware of the git archive command. The svn equivalent is svn export, which creates a file tree that you can then tar up with whatever tool you want.
It feels like something that usually wouldn’t even be that noteworthy in the new git release, and so nobody thought to check for things relying on the exact prior compression algorithm’s output or to make a big warning post in advance about it.
Nix solves the problem of unstable tarballs by unpacking them then re-packing them as NAR, its own stable archive format (Figure 5.2). It works quite well, and it’s generally useful for hashing trees of files, for example when projects don’t publish tarballs at all or when you need a hash of a generic result of a network operation (like cargo vendor or go mod download).
PSA: if you get hit by this, take the rest of the day off. This is a huge ecosystem-breaking thing and stressing about it will not help anyone. Take the rest of the day off and go do fun things like playing video games with your kids.
Looks like an internet firestorm made Github revert the change.
This isn’t the first time dynamically-generated GitHub archives change checksum, and it won’t be the last.
Something similar happened to the Protobuf library in 2017. I had to file a feature request asking that they start cutting proper release tarballs: https://github.com/protocolbuffers/protobuf/issues/3894
More worryingly, the Kubernetes tarballs once changed due to a new commit in the repository. Their source included a stamped “short commit” reference, where GitHub would inject the shortest unique string for the tagged commit. A new commit was added that made the shortest unique string less unique, so the generated tarballs changed their file content (not just tar/compression settings).
If you want a stable checksum, dynamically generated tar.gz files are not the way to go. Ask the maintainers to upload release tarballs, or do your own mirroring of your dependencies (you should be doing this anyway).
The FreeBSD ports system’s integration with GitHub lets you specify a commit for the port to depend on. This is useful because you sometimes have release branches carrying important fixes and it’s simpler to point at them than carry those fixes locally. It is also designed to allow offline builds and so has a separate fetch phase from the build phase. The fetch phase doesn’t depend on the existence of tools like
git
, so needs to fetch tarballs. The GitHub dynamic archive generation interface is great for this.That said, distfiles are also mirrored on project infrastructure, so if the GitHub upstream version fails the checksum then the builders should be able to grab the version from the mirror. Dealing with a new archive generation tool is probably not too much of a problem, just a bit annoying.
Makes you wonder how something like this got through with no prior announcement or communication of any kind it seems. Surely they knew they’d have to revert this?
From the link, it looks as if this is a change in git, and GitHub’s API is just a thin wrapper around the
git
tool. The problem isn’t that git changed it, it’s the combination of three things:git
tool changed.git
directly.I’m honestly quite surprised that this is how it works at GitHub, but then I wasn’t aware of the
git archive
command. Thesvn
equivalent issvn export
, which creates a file tree that you can then tar up with whatever tool you want.It feels like something that usually wouldn’t even be that noteworthy in the new git release, and so nobody thought to check for things relying on the exact prior compression algorithm’s output or to make a big warning post in advance about it.
Update post: “We are reverting this change for now. More details to follow.”
Nix solves the problem of unstable tarballs by unpacking them then re-packing them as NAR, its own stable archive format (Figure 5.2). It works quite well, and it’s generally useful for hashing trees of files, for example when projects don’t publish tarballs at all or when you need a hash of a generic result of a network operation (like
cargo vendor
orgo mod download
).I’m now glad I publish a separate “official” tarball for every Terminator release on Github.