1. 5
  1.  

  2. 1

    What is the size of the compressed XML file for the example given?

    1. 2

      Geofabrik.de data extracts - Massachusets (example mentioned in article)

      • osm.pbf: 234 MB
      • osm.bz2: 384 MB

      Planet:

      • osm.pbf: 46 GB
      • osm.bz2: 79 GB

      Not much larger.

      AFAIK, it was designed mostly not to reduce size, but to make processing more efficient. According to OSM wiki:

      It is about half of the size of a gzipped planet and about 30% smaller than a bzipped planet. It is also about 5x faster to write than a gzipped planet and 6x faster to read than a gzipped planet. The format was designed to support future extensibility and flexibility.

      Probably, the killer feature is these “fileblocks”, allowing semi-random access. For example, if you have planet dump lying around, and want to extract some area out of it, on first run you need to traverse whole file, during that you can collect list of fileblocks that contain data in your area, and on the second parse you can touch only these blocks. Similar to pages in e.g. Postgres, which allow tricks like “bitmap and” index operations.