The description of a for loop over every item in the table, perhaps assisted by indexes, got me wondering how the database stores each row on disk such that it can be quickly accessed at random or sequentially. Since many tables have text of variable length, it would be impractical to define a constant length for each row’s data – so then how is it stored?
After searching a bit, it looks like there is no single answer – the exact storage format depends on the database engine and its configuration. However, at least MySQL (source) and PostgreSQL use a folder for the database engine, sub-folders for each database, and some number of table-specific files for each table. More details about PostgreSQL’s storage file layout from PostgreSQL’s documentation:
Each table and index is stored in a separate file, named after the table or index’s filenode number, which can be found in pg_class.relfilenode. In addition to the main file (a/k/a main fork), each table and index has a free space map (see Section 54.3 [Free Space Map]), which stores information about free space available in the relation.
A table that has columns with potentially large entries will have an associated TOAST table, which is used for out-of-line storage of field values that are too large to keep in the table rows proper. pg_class.reltoastrelid links from a table to its TOAST table, if any. See Section 54.2 [TOAST] for more information.
As mentioned above, TOAST is used when a single page cannot hold all the relevant data:
PostgreSQL uses a fixed page size (commonly 8 kB), and does not allow tuples to span multiple pages.
Each of these (commonly 8 kB) pages includes certain sections, most notably ItemIdData: “Array of (offset,length) pairs pointing to the actual items. 4 bytes per item.”
That documentation answered most of my questions about how potentially-large data can be stored so it can be accessed efficiently.
Parallelizing Dynamic Programming Through Rank Convergence. I found the longest common sub-sequence example especially compelling, particularly the way that the paper provides a mathematical rigor for the intuition that the large parts of an LCS match could be found in parallel because there’s a bit of “slop” between them if they don’t overlap. There is an error in one of the diagrams, though, that really tripped me up for a bit.
The description of a
for
loop over every item in the table, perhaps assisted by indexes, got me wondering how the database stores each row on disk such that it can be quickly accessed at random or sequentially. Since many tables have text of variable length, it would be impractical to define a constant length for each row’s data – so then how is it stored?After searching a bit, it looks like there is no single answer – the exact storage format depends on the database engine and its configuration. However, at least MySQL (source) and PostgreSQL use a folder for the database engine, sub-folders for each database, and some number of table-specific files for each table. More details about PostgreSQL’s storage file layout from PostgreSQL’s documentation:
As mentioned above, TOAST is used when a single page cannot hold all the relevant data:
Each of these (commonly 8 kB) pages includes certain sections, most notably
ItemIdData
: “Array of (offset,length) pairs pointing to the actual items. 4 bytes per item.”That documentation answered most of my questions about how potentially-large data can be stored so it can be accessed efficiently.
This free online textbook by a postgres contributor is a good resource for this, especially section 1.3, which describes the internal layout of a postgres page.
Parallelizing Dynamic Programming Through Rank Convergence. I found the longest common sub-sequence example especially compelling, particularly the way that the paper provides a mathematical rigor for the intuition that the large parts of an LCS match could be found in parallel because there’s a bit of “slop” between them if they don’t overlap. There is an error in one of the diagrams, though, that really tripped me up for a bit.