1. 18
  1. 4

    IMO this is a good candidate for a partitioned table. Partition by (e.g.) one month, and you can delete an entire month’s worth of data just by dropping the table.

    1. 2

      The author mentions this at the bottom of the article.

      1. 1

        So they do.

      2. 1

        That may or may not be useful on SSDs/NVMe. One of the world’s MySQL experts that I know had a rant about this: partitioned tables are not faster if you’re not on spinning disks. I don’t remember the details but it was fairly compelling. This obviously depends on your storage engine. For something like Splunk where archival/deletion of cold buckets was a design goal for the storage engine it works fine, but Splunk is automatically partitioned by timestamp anyway.

      3. 1

        I recall the existence of databases specially designed for ergonomic management of time-series data. I wonder if those have worked out a good solution to efficiently deleting data beyond some horizon.

        1. 6

          The problem is that if your main store is an ACID db and you use your events table for audit/compliance (as opposed to best-effort logging), you really want writes to it to be in a transaction with the events they’re auditing. Otherwise you need some sort of distributed transaction, and no one wants to deal with that just for audits.

          So a specialized time-series DB would be great for audit data but probably not for your main DB in a SaaS product.

          1. 1

            Ah, I see. And this can’t be baked into the database because there’s no one-size-fits-all solution?