1. 8

  2. 3

    A lot of early operating systems had native support for structured data on their filesystems. These fell out of use because programmers inevitably wanted subtly different data types and because anything that you can implement on an OS with record types, you can implement with a library on an OS with only flat files.

    Having structured data isn’t sufficient for interop. You need some form of ontology. You can agree out-of-band on what the data in a particular structured format means, but then you’ve just invented file formats, or you can try to make something self-describing. A load of people have tried the second approach, but none that I’m aware of have succeeded.

    1. 1

      Macintosh resource forks had agreed upon structures, FWIW. I think a file with a custom kind of resource would have often shipped with the templates needed in ResEdit to make sense of them.

      1. 1

        As I recall, that’s only partially true. Originally, HFS supported only two forks, a data fork and a resource fork, along with some extra metadata that other filesystems often lack (in particular, a 32-bit content type and creator type, instead of a file extension). By convention, the resource fork contained a particular format. HFS+ added support for an arbitrary number of forks (as did NTFS, for AFP support, but NT didn’t add any tools for enumerating forks and didn’t report their contents in file size, so for a long time you could hide files from NT admins and from quotas by sticking them in alternative data streams of a tiny text file). There were no conventions as to the other forks.

        Even with the resource forks, the contents had an agreed structure only by convention. The OS exposed it as a stream. There was nothing stopping you from putting any other structured or unstructured data in there. It was equivalent to having every file agree to a common header format or to the DOS convention of using .EXE to denote executable files: a load of stuff would break if you didn’t follow the rules, but nothing enforced them and the low-level APIs didn’t care if you violated the conventions.

    2. 1

      Language-independent data types are a great idea. Protocol Buffers were a very similar idea but weren’t targeted at the kind of scientific computing that this is. They initially supported a few primitives, enums, lists & nested structs but now support maps & unions too. In theory, they could be extended to support more esoteric types.