1. 32

  2. 2

    The description of a for loop over every item in the table, perhaps assisted by indexes, got me wondering how the database stores each row on disk such that it can be quickly accessed at random or sequentially. Since many tables have text of variable length, it would be impractical to define a constant length for each row’s data – so then how is it stored?

    After searching a bit, it looks like there is no single answer – the exact storage format depends on the database engine and its configuration. However, at least MySQL (source) and PostgreSQL use a folder for the database engine, sub-folders for each database, and some number of table-specific files for each table. More details about PostgreSQL’s storage file layout from PostgreSQL’s documentation:

    Each table and index is stored in a separate file, named after the table or index’s filenode number, which can be found in pg_class.relfilenode. In addition to the main file (a/k/a main fork), each table and index has a free space map (see Section 54.3 [Free Space Map]), which stores information about free space available in the relation.

    A table that has columns with potentially large entries will have an associated TOAST table, which is used for out-of-line storage of field values that are too large to keep in the table rows proper. pg_class.reltoastrelid links from a table to its TOAST table, if any. See Section 54.2 [TOAST] for more information.

    The contents of tables and indexes are discussed further in Section 54.5 [Database Page Layout].

    As mentioned above, TOAST is used when a single page cannot hold all the relevant data:

    PostgreSQL uses a fixed page size (commonly 8 kB), and does not allow tuples to span multiple pages.

    Each of these (commonly 8 kB) pages includes certain sections, most notably ItemIdData: “Array of (offset,length) pairs pointing to the actual items. 4 bytes per item.”

    That documentation answered most of my questions about how potentially-large data can be stored so it can be accessed efficiently.

    1. 4

      This free online textbook by a postgres contributor is a good resource for this, especially section 1.3, which describes the internal layout of a postgres page.

    2. 2

      After years of seeing venn diagrams explaining joins it’s nice to see the underlying algorithms explained.

      1. 1

        I think MergeJoin in Postgres is currently implemented using something that I’ve seen called “zipper algorithm”.

        1. 1

          Can anyone point out a good into resource to the query planner? The style of this was good (demystifying is an apt title), and something similar to this laying out how a DB engine would choose how to actually get here would be rad.

          1. 2

            Query planning is where there is more variability among databases. The Further Reading section has links to a survey by Graefe on query planning. The most commonly implemented query planners are variations of two frameworks called Volcano and Cascades. Both by Graefe. A basic introduction to Volcano will be very enlightening. I might write about it in the future when I learn more.

            1. 1

              Thanks for the search terms!