This is an interesting project, but I wonder if they could have done this a little simpler. Instead of diffing as you go, assemble a “table of contents” for both files: blocks by length and hash, using a rolling hash (like they already implemented) for block boundaries, but a real hash (like blake3) for duplicate checking. (This is effectively what bitbottle does.) Then build a package of only the new table of contents, and any blocks missing from the original table of contents.
I only skimmed their spec, but it seemed like they’re 2/3 of the way to that model, but got bogged down in edge cases and trying to use the rolling hash as a first-pass duplicate checker instead of just a block boundary decider.
I was looking around for state of the art in binary patches for software updates/installs. I found aging references to courgette and bsdiff but I can’t tell if they’re still a thing. Or maybe there is other software that has superseded them. Someone pointed me at this project which looks well kept.
But I’m curious: what’s the fastest/best out there today and what works on all of win/mac/linux?
My understanding is that bsdiff isn’t really specific to any one platform? because it creates and applies (concise) patches for diffs between arbitrary files and it’s just that it uses heuristics that often work well for object code.
There’s a fork of a Go implementation of bsdiff in the wharf project here, vendored in and mentioned at the end of the readme.
This is an interesting project, but I wonder if they could have done this a little simpler. Instead of diffing as you go, assemble a “table of contents” for both files: blocks by length and hash, using a rolling hash (like they already implemented) for block boundaries, but a real hash (like blake3) for duplicate checking. (This is effectively what bitbottle does.) Then build a package of only the new table of contents, and any blocks missing from the original table of contents.
I only skimmed their spec, but it seemed like they’re 2/3 of the way to that model, but got bogged down in edge cases and trying to use the rolling hash as a first-pass duplicate checker instead of just a block boundary decider.
I was looking around for state of the art in binary patches for software updates/installs. I found aging references to courgette and bsdiff but I can’t tell if they’re still a thing. Or maybe there is other software that has superseded them. Someone pointed me at this project which looks well kept.
But I’m curious: what’s the fastest/best out there today and what works on all of win/mac/linux?
My understanding is that bsdiff isn’t really specific to any one platform? because it creates and applies (concise) patches for diffs between arbitrary files and it’s just that it uses heuristics that often work well for object code.
There’s a fork of a Go implementation of bsdiff in the wharf project here, vendored in and mentioned at the end of the readme.
Yup saw that. Cool project!
I have no clue just wanted to make sure.