1. 53
  1. 7

    How does this compare to binwalk?

    1. 14

      We started with binwalk, but the main reason we developed our own solution is that it was not good enough.

      The biggest difference is that binwalk just goes through a file linearly and whatever it finds, it tries to extract, resulting in a lot of noise (like license text) and false-positives. unblob is way smarter and more precise, by extracting files based on their format specification (like recognizing the format header struct and carves out files based on size values in headers). Here is an example for NTFS: https://github.com/onekey-sec/unblob/blob/main/unblob/handlers/filesystem/ntfs.py#L68

      We are using it for months in production, and the results are way better than with binwalk before. We are getting less false-positives and even if unblob fails to extract everything, we still get meaningful information out of firmwares, where binwalk just failed with no output previously. It’s in feature-parity with binwalk and because of Hyperscan it’s faster and we can handle bigger (4Gb+ firmwares) with no problems, which was not possible with binwalk.

      1. 4

        Thanks, that’s really useful information. The NTFS extractor example is very motivating as it looks very neat. Perhaps this comparison with binwalk would be a useful addition for the README?

        1. 3

          Forgive me for being lazy and not simply reading the source code, but how easy is it to “teach” unblob about additional formats, etc? When using unblob’s API, is it possible to register an additional “format recognizer” that would integrate with unblob or would I have to fork it as a whole?

          1. 12

            It’s very easy, you have to implement literally 1 Python class with 1 method! We have a step-by-step working example you can follow even if you have not much programming experience: https://unblob.org/development/#writing-handlers
            For some formats, you can just copy-paste the C struct from the format specification, calculate the end of the file and run an extractor in one line and that’s it.

            Depending on the format’s complexity, it’s possible to implement support for a new format in a couple of hours!

            We also have a plugin system in place (not documented yet) which we are using in production, so you can just install a Python package and have extra handlers! Pretty neat.

            1. 2

              Thanks for the detailed response. Sounds like you guys have a well-designed system in place. Looking forward to checking it out later today.

      2. 3

        This is cool! It reminds me a bit of diffoscope but without the diff. I will definitely be using this to root around random firmware images.

        1. 2

          There is also the Kaitai Struct and 010 formats in this space.

          1. 1

            I don’t usually have trouble with these formats. But I would appreciate something to extract (and repack) game assets, installers, self-extracting archives and other mostly home grown formats at the long tail.

            1. 1

              We have a plugin system in place, so it would be possible to have separate Python packages for a group of format Handlers related to different use cases. I will document the plugin interface later.