First of all, thank you for authoring Litestream! I was wondering how this immediate replication works with AWS S3. If we want every single transaction to end up in the replica, we’d need to save every incremental change, but isn’t that against the object model of S3? I.e. AFAIK you can’t append to your S3 files, so does Litestream create a new S3 object for each row update? Wouldn’t that be slow/expensive? Or does it only upload the WAL from checkpoint to checkpoint, in which case, how can we be sure that the workflow in the post will work? (i.e. make a change in heroku, stop the application, run it somewhere else and see that your change is there now)
Litestream works as asynchronous replication similar to Postgres replication. It reads new pages off the WAL and copies them to a replica (either another disk or S3). The S3 replica has a configurable interval for uploading these batches of WAL pages. By default it’s every 10 seconds although you can reasonably configure it down lower thresholds such as every 1 second.
It all works as kind of a hack around S3’s cost structure. S3 doesn’t charge bandwidth for uploads—it only charges $0.000005 per POST request. If you were to have a constant stream of writes against your database and Litestream uploaded every 10s then it would cost you about $1.30/month in upload request charges. That’s assuming a constant stream of writes 24/7 though. Most applications will have periods of inactivity so your costs will likely be lower as in the OP’s case.
Thanks for the nice answer! That seems like a really nice trade-off. Would it at all be possible for the application to know when a transaction has been successfully replicated? I’m imagining a situation where the effects of a particular transaction needs to be guaranteed to have persisted before a follow up action is taken.
Yes, I put together an example of using Litestream as a library the other day and it allows you to force a flush to confirm that it’s been replicated before returning to a client:
It currently only works with Go applications although I’d like to make the functionality available with the regular litestream command so any language can do it—maybe using unix sockets or something. I haven’t quite decided yet.
This is kind the antidote to that HN post from a few weeks back about a 1 person tech startup that was using, depending on how you count, about 20 to 30 discrete chunks of tech (of which I think about a bit under ten were SaaS services all from completely different vendors).
Before I read your link, a guess: custom sqlite3_vfs whose xWrite() appends the offset + new block contents to a log, and otherwise pretty much just passes everything through to the default local filesystem sqlite3_vfs. Then a background process streams the log blocks up to S3 asynchronously, with some batching. On restore, pull the logs from S3 and replay them (blocking in the xOpen() call until the restore finishes.)
Reads the article you linked
Okay, it’s asynchronous log replication using sqlite’s built-in WAL. Sounds much less likely to cause accidental DB corruption than my guessed idea. :)
VFS is one avenue I tried originally with Litestream and it definitely has some benefits. I wanted something that didn’t require any changes to the application though and eventually built it out as a separate process. It uses SQLite for some locking but otherwise reads and validates the WAL directly so that it should never cause corruption issues.
Ah! I didn’t think you’d be able to do that without the cooperation of the other process. Neat. Thanks for answering my endless cascade of questions. <3
For those wondering about rationale (I saw a comment asking, but it seems to have been deleted), you can dramatically simplify your deployment if you limit writes to a single process. See David Crawshaw’s blog post and associated talk.
There are also performance benefits for many applications which typically have simple primary key or index scan queries since there’s almost no per-connection overhead. If you have a fast front end language like Go then you can scale up to thousands of requests per second on very minimal hardware (e.g. $5/mo DigitalOcean boxes).
I think litestream is extremely interesting tech, but I’ve been bothered (and admittedly too lazy to test or read more) by this possibility:
Users PUTs logs.
App generates id, writes to SQLite, commits
Plug gets pulled (or, in cloud terms, indtanace/dyno/container) and machine goes away.
Litestream doesn’t stream the WAL segment to s3.
On Heroku this would be bad, and the write completely gone when the app comes up again. So, I guess the assumption that needs to be made is that the disk is persisted across container runs?
I agree with everything ngrilly said but I’ll also add that you can use Litestream as a library in a Go app if you want to selectively confirm that certain writes are sync’d to a replica before returning to the client. The API needs to be cleaned up some but there’s an example in this repo: https://github.com/benbjohnson/litestream-library-example
Ongoing maintenance: backups, replication & failover, in place upgrades (aiui converting ask the on-disk tuples across major versions has historically been painful for people), rebooting the pg box because it got a kernel update and dealing with downtime.
Incidentally fwiw Debian is my last favourite place to run pg because I’ve been bitten by bugs in the maintainer scripts Debian adds to it.
The DBaaS solutions to these things are just “put credit card in, probably never think about it again” unless your needs are extremely stringent.
Litestream author here. Michael Lynch did a great job describing Litestream in the post but I’m happy to answer any questions as well.
First of all, thank you for authoring Litestream! I was wondering how this immediate replication works with AWS S3. If we want every single transaction to end up in the replica, we’d need to save every incremental change, but isn’t that against the object model of S3? I.e. AFAIK you can’t append to your S3 files, so does Litestream create a new S3 object for each row update? Wouldn’t that be slow/expensive? Or does it only upload the WAL from checkpoint to checkpoint, in which case, how can we be sure that the workflow in the post will work? (i.e. make a change in heroku, stop the application, run it somewhere else and see that your change is there now)
Litestream works as asynchronous replication similar to Postgres replication. It reads new pages off the WAL and copies them to a replica (either another disk or S3). The S3 replica has a configurable interval for uploading these batches of WAL pages. By default it’s every 10 seconds although you can reasonably configure it down lower thresholds such as every 1 second.
It all works as kind of a hack around S3’s cost structure. S3 doesn’t charge bandwidth for uploads—it only charges $0.000005 per POST request. If you were to have a constant stream of writes against your database and Litestream uploaded every 10s then it would cost you about $1.30/month in upload request charges. That’s assuming a constant stream of writes 24/7 though. Most applications will have periods of inactivity so your costs will likely be lower as in the OP’s case.
Thanks for the nice answer! That seems like a really nice trade-off. Would it at all be possible for the application to know when a transaction has been successfully replicated? I’m imagining a situation where the effects of a particular transaction needs to be guaranteed to have persisted before a follow up action is taken.
Yes, I put together an example of using Litestream as a library the other day and it allows you to force a flush to confirm that it’s been replicated before returning to a client:
https://github.com/benbjohnson/litestream-library-example
It currently only works with Go applications although I’d like to make the functionality available with the regular
litestream
command so any language can do it—maybe using unix sockets or something. I haven’t quite decided yet.This is kind the antidote to that HN post from a few weeks back about a 1 person tech startup that was using, depending on how you count, about 20 to 30 discrete chunks of tech (of which I think about a bit under ten were SaaS services all from completely different vendors).
Makes me wonder if in a generation we might see one person space startups
First time I’m hearing of Litestream; if you’re also curious, here’s their “how it works” page https://litestream.io/how-it-works/
Before I read your link, a guess: custom
sqlite3_vfs
whosexWrite()
appends the offset + new block contents to a log, and otherwise pretty much just passes everything through to the default local filesystem sqlite3_vfs. Then a background process streams the log blocks up to S3 asynchronously, with some batching. On restore, pull the logs from S3 and replay them (blocking in thexOpen()
call until the restore finishes.)Reads the article you linked
Okay, it’s asynchronous log replication using sqlite’s built-in WAL. Sounds much less likely to cause accidental DB corruption than my guessed idea. :)
edit: I meant xOpen(), not xRead().
VFS is one avenue I tried originally with Litestream and it definitely has some benefits. I wanted something that didn’t require any changes to the application though and eventually built it out as a separate process. It uses SQLite for some locking but otherwise reads and validates the WAL directly so that it should never cause corruption issues.
That makes sense. The fact that it works without having to link a library in to the running application and register it is pretty neat.
Am I right in guessing is that it’ll fail if you try using the rollback journal or the F2FS atomic write mechanism instead of the WAL, please?
Litestream will automatically move the database into WAL mode if it’s not already set. You’re correct that it doesn’t work with other journal modes.
Ah! I didn’t think you’d be able to do that without the cooperation of the other process. Neat. Thanks for answering my endless cascade of questions. <3
For those wondering about rationale (I saw a comment asking, but it seems to have been deleted), you can dramatically simplify your deployment if you limit writes to a single process. See David Crawshaw’s blog post and associated talk.
There are also performance benefits for many applications which typically have simple primary key or index scan queries since there’s almost no per-connection overhead. If you have a fast front end language like Go then you can scale up to thousands of requests per second on very minimal hardware (e.g. $5/mo DigitalOcean boxes).
As well as dramatically reduce performance and availability. Sometimes these are ok tradeoffs.
I think litestream is extremely interesting tech, but I’ve been bothered (and admittedly too lazy to test or read more) by this possibility:
On Heroku this would be bad, and the write completely gone when the app comes up again. So, I guess the assumption that needs to be made is that the disk is persisted across container runs?
Yes, you need persistent volumes to ensure the last few seconds of committed data are fsynced to disk, which is the case on Fly, but not on Heroku.
I agree with everything ngrilly said but I’ll also add that you can use Litestream as a library in a Go app if you want to selectively confirm that certain writes are sync’d to a replica before returning to the client. The API needs to be cleaned up some but there’s an example in this repo: https://github.com/benbjohnson/litestream-library-example
I never understand this “can’t maintaim a db server” position.
apt install db-server
uogrades happen same way as all your other dependencies anyway?Ongoing maintenance: backups, replication & failover, in place upgrades (aiui converting ask the on-disk tuples across major versions has historically been painful for people), rebooting the pg box because it got a kernel update and dealing with downtime.
Incidentally fwiw Debian is my last favourite place to run pg because I’ve been bitten by bugs in the maintainer scripts Debian adds to it.
The DBaaS solutions to these things are just “put credit card in, probably never think about it again” unless your needs are extremely stringent.
migrating data between servers is a pain though. I am lazy and want a truly 12 factor app.