1. 9

On Monday, July 20th, Sentry was down for most of the US working day. We deeply regret any issues this may have caused for your team and have taken measures to reduce the risk of this happening in the future. For transparency purposes, and the hope of helping others who may find themselves in this situation, we’ve described the event in detail below.

  1.  

  2. 3

    Their provided config is:

    autovacuum_freeze_max_age = 500000000

    vacuum_freeze_table_age = 600000000

    Yet the postgres docs say:

    The effective maximum for vacuum_freeze_table_age is 0.95 * autovacuum_freeze_max_age; a setting higher than that will be capped to the maximum. A value higher than autovacuum_freeze_max_age wouldn’t make sense because an anti-wraparound autovacuum would be triggered at that point anyway, and the 0.95 multiplier leaves some breathing room to run a manual VACUUM before that happens.

    I wonder if this is why they were running in to problems? Unless I’m mistaken, these settings will cause postgres to slowly accumulate very old XIDs until it’s forced to run a full table sweep at the last minute. It seems to me that, in a write-heavy situation, you’d want the table_age much less than the max_age. That way there is more headroom for a large vacuum to finish before postgres locks the database.