IMO this is a good candidate for a partitioned table. Partition by (e.g.) one month, and you can delete an entire month’s worth of data just by dropping the table.
The author mentions this at the bottom of the article.
So they do.
That may or may not be useful on SSDs/NVMe. One of the world’s MySQL experts that I know had a rant about this: partitioned tables are not faster if you’re not on spinning disks. I don’t remember the details but it was fairly compelling. This obviously depends on your storage engine. For something like Splunk where archival/deletion of cold buckets was a design goal for the storage engine it works fine, but Splunk is automatically partitioned by timestamp anyway.
I recall the existence of databases specially designed for ergonomic management of time-series data. I wonder if those have worked out a good solution to efficiently deleting data beyond some horizon.
The problem is that if your main store is an ACID db and you use your events table for audit/compliance (as opposed to best-effort logging), you really want writes to it to be in a transaction with the events they’re auditing. Otherwise you need some sort of distributed transaction, and no one wants to deal with that just for audits.
So a specialized time-series DB would be great for audit data but probably not for your main DB in a SaaS product.
Ah, I see. And this can’t be baked into the database because there’s no one-size-fits-all solution?