More posts like this please. Good focus, short but good analysis / reasoning for why Postgres doesn’t get it right by default, and practical advice on what to do about it. Bravo!
The book “PostgreSQL 9.6 High Performance” should give significantly more information about that and many other things - for a price, of course.
Is this an updated version of the excellent “PostgreSQL 9.0 High Performance”? I liked the book, especially as it takes a deep look at how to evaluate the system below the database. (which is useful in any DB context)
It is an updated version. I read the 9.0 book and I liked it. I haven’t read the 9.6 version, only skimmed through it and read the table of contents. The updated version can’t be as bad as the review claims, for example it does talk about Block Range Indexes which were introduced in PG 9.5 or the parallel stuff that went into PG 9.6. I can’t comment on the editing.
Sadly, pgtune doesn’t seem to be maintained or at least updated to newer Postgres versions. And there’s too much going on from one PG release to another for pgtune to stay relevant for years.
The feeling you get when fixing a single issue has an unexpected global benefit is incredible.
Performance tuning in general is a mysterious art of balancing concerns, but database tuning in particular seems one of the blackest and so often overlooked.
With Oracle (and I assume the same is possible with Postgres) you can have each table on a different drive, with table partitioning the same table could span across multiple drives (ie. if you know records older than one year are not access often you could partition them onto a slower drive). This makes an automated test on startup not enough to determine the actual cost.
Difficult to make a reliable benchmark? You don’t want the answer to change after a restart with cached data, etc. which implies a minimum threshold of rigor and I’m not sure how enthused I’d be if my database were conducting such a rigorous test automatically.
More posts like this please. Good focus, short but good analysis / reasoning for why Postgres doesn’t get it right by default, and practical advice on what to do about it. Bravo!
This has been listed on a (a bit outdated) PG wiki page about tuning: https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
Another popular thing people tend to update is cpu_tuple_cost.
The book “PostgreSQL 9.6 High Performance” should give significantly more information about that and many other things - for a price, of course.
Is this an updated version of the excellent “PostgreSQL 9.0 High Performance”? I liked the book, especially as it takes a deep look at how to evaluate the system below the database. (which is useful in any DB context)
The comment here seems to indicate that the update didn’t work well: https://www.amazon.com/product-reviews/1784392979/ref=cm_cr_dp_d_cmps_btm?ie=UTF8&reviewerType=all_reviews
It is an updated version. I read the 9.0 book and I liked it. I haven’t read the 9.6 version, only skimmed through it and read the table of contents. The updated version can’t be as bad as the review claims, for example it does talk about Block Range Indexes which were introduced in PG 9.5 or the parallel stuff that went into PG 9.6. I can’t comment on the editing.
This is also excellent: https://github.com/gregs1104/pgtune
Sadly, pgtune doesn’t seem to be maintained or at least updated to newer Postgres versions. And there’s too much going on from one PG release to another for pgtune to stay relevant for years.
The feeling you get when fixing a single issue has an unexpected global benefit is incredible. Performance tuning in general is a mysterious art of balancing concerns, but database tuning in particular seems one of the blackest and so often overlooked.
Anyone know why Postgres doesn’t just check (at startup) how fast sequential reads are and how fast random seeks are?
I’m not generally a fan of turning something into a config setting when the machine can figure it out for you…
With Oracle (and I assume the same is possible with Postgres) you can have each table on a different drive, with table partitioning the same table could span across multiple drives (ie. if you know records older than one year are not access often you could partition them onto a slower drive). This makes an automated test on startup not enough to determine the actual cost.
Yes, that’s achievable with tablespaces: https://www.postgresql.org/docs/current/static/manage-ag-tablespaces.html
You can set different values of seq_page_cost, random_page_cost and effective_io_concurrency per tablespace.
Difficult to make a reliable benchmark? You don’t want the answer to change after a restart with cached data, etc. which implies a minimum threshold of rigor and I’m not sure how enthused I’d be if my database were conducting such a rigorous test automatically.