1. 6
  1. 2

    Have there been some interesting developments since last time it was posted (apart from name change ;))?

    1. 1

      Less well known are integer overflow bugs. Offset-length pairs, defining a sub-section of a file, are seen in many file formats, such as OpenType fonts and PDF documents. A conscientious C programmer might think to check that a section of a file or a buffer is within bounds by writing if (offset + length < end) before processing that section, but that addition can silently overflow, and a maliciously crafted file might bypass the check.

      So, some experiences from the pixel mines.

      We were implementing our own format for 3D mesh data for static (initially) meshes, and went through probably 4-5 implementations. The initial format was based on nice tree structures, while the final format was more of a length-prefixed block-based approach.

      The reason for this change is that, if you have a maliciously-formed tree structure, it can be really easy to hamstring your parser. You have extra records, or not enough records, and parsing gets to be a headache. You also have to build a smarter parser, because you kind of need to keep an idea of state as you pop up and down in the hierarchy of things, and you can’t really make a lot of allocation guarantees ahead of time until the tree is walked.

      By contrast, a block format lets you quickly skip down the list of blocks, do most of your allocations up front, and then patch up and copy things around at the end. At that point, having good safe arithmetic routines prevents you from over-allocating or under-allocating things.

      Towards that end, in C++ a very handy thing to do is to create a BinaryRegionReader class that provides “safe” and bounds-checked access to a region of memory, and which allows the creation of child BinaryRegionReaders.

      1. 1

        Towards that end, in C++ a very handy thing to do is to create a BinaryRegionReader class that provides “safe” and bounds-checked access to a region of memory, and which allows the creation of child BinaryRegionReaders.

        So, basically std::string_view?

        1. 0

          Sort of, but also with the ability to read off native types in order and respect endianness and do seeking safely.

          1. 1

            Sure, that makes sense.

            Btw what lead you to go with completely custom format instead of using something like protocol buffers or cap’n proto?

            1. 1

              Reasons we didn’t use those:

              • We had our own routines that better handled certain issues (see: safe arithmetic)
              • We were doing some of that as a learning project
              • Wanted to easily support multiple platforms–our codebase was already setup for that
              • Would still have required coming up with a format for the layout (since you want something that is easy to shove into graphics buffers anyways) even with the help those libs provide
              • Add extra build steps and autogenerated code wasn’t appealing
              • Having our own code/copyright gave more licensing flexibility (WTFPL ftw)

      Stories with similar links:

      1. wuffs: Wrangling Untrusted File Formats Safely via GrayGnome 1 year ago | 12 points | 3 comments