1. 28
  1. 3

    “Don’t have long-running transactions” is database ops No 1 commandment surely?

    I’m kind of surprised they still don’t know where the long running transactions are coming from & have spent a ton of developer time eliminating sub-transactions from their code all over the place instead of tracking them down. Is it really impossible to instrument Postgres to warn on long running transactions with a dump of the SQL query that triggered that transaction?

    (I haven’t done any serious database work in >> decade, so maybe this is actually hard?)

    1. 4

      Not all long-running transactions are user initiated. Example from their post:

      There was a long-running transaction, usually relating to PostgreSQL’s autovacuuming, during the time. The stalls stopped quickly after the transaction ended.

      […]

      In our database, it wasn’t practical to eliminate all long-running transactions because we think many of them happened via database autovacuuming, but we’re not able to reproduce this yet.

      They’ve done an excellent job engaging the right people to solve the problem, and frankly I agree with their approach. Running a patched version of Postgres isn’t guaranteed to solve their problem and may well be an operational headache.

      1. 4

        Sure, but can the database not identify the source of long-running transactions? I thought this was one of the first things you’re supposed to look at when tracking down database performance issues, so surely it ought to be a well-trodden path?

        At the moment they’re simply speculating that the transactions come from vacuums - they don’t actually know! (Unless I misread the blog post?)

        Is PostgreSQL is not capable of tracking transactions like this?

    2. 3

      They had three options, and I quote:

      1. Eliminate SAVEPOINT calls completely.
      2. Eliminate all long-running transactions.
      3. Apply Andrey Borodin’s patches to PostgreSQL and increase the subtransaction cache.

      I haven’t looked at those patches and won’t comment on the wisdom of using them. From a quick glance, Andrey doesn’t seem clueless, though.

      Eliminating long-running transactions is considered good practice. Those tend to cause problems somehow, sooner or later. I can see why they considered this possibility, but I have problem with their reasons for choosing not to… I find it very, very difficult to believe that autovacuum is the ultimate cause of long-running transactions. Autovacuum may be found loitering near long-running transaction, waiting to be allowed to clean up, but I’ve never seen it be the cause of an overlong transaction.

      So I think they spent a month on getting rid of SAVEPOINT because finding the real reason for the long-running transactions would have taken even longer. That blog posting is commendable honesty on their part.