The main app I’m working on supports both SQLite and PostgreSQL; for context, this is a web analytics web app. The main reason SQLite is supported is because it’s easier to set up and maintain for self-hosting.
In the very first versions SQLite was actually faster than PostgreSQL in my (basic) testing, although this is no longer the case as features have been added. I’m actually running some performance comparisons today; some thoughts:
Inserting the 10 million test pageviews for benchmarking takes a long time in SQLite; as in, about 24 hours. This will amounts to over 100 pageviews/second which is more than fast enough, but PostgreSQL takes about 2 hours for the same, and is about 10x faster.
PostgreSQL is generally faster in loading the dashboard, although by how much depends a bit on what kind of data you have. With 10M pageviews spread out over 1,000 paths the differences aren’t super-large: about 1s in PostgreSQL vs. 3s in SQLite, but when spread out over 10,000 paths it becomes a bit larger: about 3s for PostgreSQL vs 10s for SQLite.
These numbers are on my rather slow-ish laptop. Even a $10/month Linode VPS gave me better performance with PostgreSQL; these numbers are just a (preliminary) relative comparison.
I haven’t spent as much time optimizing SQLite as I have PostgreSQL, so there could be something to win here.
(Arguably SQL isn’t the best kind of backend storage for these kind of things in the first place, but it’s operationally a lot easier for people to run than some specialized thing)
You almost certainly want to set journal_mode to wal; this is not the default. It’s almost always faster, has better support for concurrency, and comes with essentially no real downsides.
It can be really easy to allow concurrent writes if you’re not careful, which will land you in a world of trouble if you don’t set the journal mode. You really want to set it; this is not a “nice to have” but a “required” as far as I’m concerned.
The default cache_size is 2M; setting it to 20M (-20000) gave me much better performance. Setting it higher didn’t seem to matter much. The default seems fairly conservative.
Database migrations in SQLite are a real pain, and error messages on check constraints extremely non-descriptive and annoying to track down. The former was partly fixed in 3.35 with alter table drop column, and the latter in 3.34.1 according to the release notes (I haven’t tried it yet, since the Go package is still at 3.34.0, so I’m not sure how much they’re improved).
This would actually be a major reason I’d choose PostgreSQL for prototyping because it’s just easier to deal with. While drop column is nice, there are still many parts missing here. Doing the entire SQLite song-and-dance to alter a constraint or something is pretty time-consuming and annoying.
The sqlite3 CLI is kinda meh and the psql one is much nicer. There are many small reasons for this, and I think anyone who has used both will appreciate this. This, again, is an important reason I’d prefer PostgreSQL for prototyping.
I really miss some of PostgreSQL’s features sometimes. Things like arrays are really nice. You can kind-of do the same in SQLite with JSON, but not really (for example array_agg() or unnest() isn’t really something you can do).
There are probably some other points I’m forgetting at the moment. My general experience with SQLite is good, and it’s absolutely not a “toy database”; overall it’s actually quite nice and fully-featured. I’d prefer to work with SQLite than MariaDB to be honest. I think MariaDB is not good software.
However, this:
There is a popular opinion among developers that SQLite is not suitable for the web, because it doesn’t support concurrent access. This is a myth. In the write-ahead log mode (available since long ago), there can be as many concurrent readers as you want. There can be only one concurrent writer, but often one is enough.
SQLite is a perfect fit for small websites and applications. sqlite.org uses SQLite as a database, not bothering with optimization (≈200 requests per page). It handles 700K visits per month and serves pages faster than 95% of websites I’ve seen.
Is a little bit more complicated, I think. The quote isn’t wrong, but it’s not just about performance (concurrent or otherwise). There are some good reasons to use SQLite, but most of them are operational (i.e. you don’t need to depend on a database server, but you will have to depend on libsqlite or a C compiler). Outside of that I think you will almost always have a better time with PostgreSQL to be honest. The main reason I support SQLite is so you can have a single static binary with no dependencies at all that can run pretty much anywhere, which is a rather nice property, and thus far SQLite-specific workarounds haven’t been painful enough to drop support for it.
For non-web apps these operational advantages are much more pronounced, because who wants to run PostgreSQL to start Firefox? But if you’re already running a server for your SaaS or whatnot then you might as well run PostgreSQL; it’s not that hard. It’s just easier for folk who want to self-host and aren’t deeply invested.
Inserting the 10 million test pageviews for benchmarking takes a long time in SQLite
You might already know what I’m about to say, but just in case – a common issue people end up hitting when bulk inserting in SQLite is that they don’t wrap the insert queries into a transaction, that is, each insert is comitted separately. When this happens, SQLite ends up going through the entire locking and then unlocking process for each one query. If using WAL mode with the default configuration, SQLite will run CHECKPOINT after each COMMIT. All of this put together, it can easily slow down the bulk insert process by one or two orders of magnitude. Also, make sure you are not preparing the query statement for each insert. This would slow things down as well.
At the same time, adding 10 million inserts in a single transaction will consume a lot of memory. Since you plan on doing some performance comparison today, I’d try batching my inserts every 10k and then 100k rows to see how it turns out.
Doesn’t using transactions containing multiple inserts also mean that SQLite only needs to iterate/update the indices for the affected tables once per commit rather than once per record?
Given the big performance insertion cost (100 pageview inserts per second seems very low), I think a lot of the work will be spent updating indices.
It actually does a lot more than just inserts, including some stuff outside of SQL. The entire process is a bit involved and not worth going in to detail here.
Also my laptop isn’t very fast and the SSD I used (a secondary SATA one, not the main NVMe) is an old one I got from a friend second-hand 8 years ago that I used as the main desktop drive for a few years. It’s probably near the end of its life, but useful for (temporary) bulk storage for stuff that doesn’t really matter. The actual real-world performance on something like a Linode VPS or AWS will be a lot better (just haven’t gotten ’round to running those tests yet).
The actual insert of the pageviews is already batched with insert (..) values (..), (..), ..., and I’m not sure if we can get more performance out of that. I think the big issue is that it also updates some secondary tables with an on conflict([..]) do update set [..], which is quite expensive. I tried wrapping the lot in a transaction just to be sure, but it didn’t seem to make much difference, if any.
I’m not really bothered by this; it’s just a one-off I need to run once and the actual performance in normal usage is fine. If you’re racking up hundreds of pageviews/second you probably want to use PostgreSQL anyway. It was just intended a comparison how SQLite can be a lot slower. The original (much simpler) version managed to get ~800/s on SQLite and ~600/s with PostgreSQL, but a lot has changed since then.
I’m not sure if it actually runs a full checkpoint after every insert or how this works exactly? I was left with a 1G -wal file after everything was done anyway, and manually used pragma wal_checkpoint to commit it after I was done.
I feel like a lot of the sqlite3 complaints w.r.t. concurrency stem from the fact that most people do not understand how to use immediate write transactions to avoid ‘database busy’ errors.
The key points being that you must set the busy timeout as well as using ‘begin immediate’ for write transactions to tell sqlite3 to immediately take a write lock. This means it won’t have to rollback halfway through a transaction when it fails to upgrade to a write lock.
This is an interesting point. As I understand it, you basically have 2 options:
handle the wait yourself:
db = sqlite3_open ...;
sqlite3_busy_handler(db, myBusyHandler, (void*)300);
and myBusyHandler will be called every time a write lock can’t happen. The handler can return 1 to try for a lock again, or 0 to give up and not try(and SQLite will then return a SQL_BUSY). i.e.
i.e. you define when to try for a lock by:
Return 1 = SQLite will immediately try to get a lock.
Return 0 = SQLite gives up and returns SQL_BUSY for the statement.
While in the function(so sleeping, etc) SQLite is waiting for a return value and does nothing.
let sqlite use the default handler, by just setting a timeout.
sqlite3_busy_timeout(sqlite3*, int ms);
where it will try up to int ms to get a lock, or return SQL_BUSY if count ms is reached without getting a lock.
Basically the 1st option is if you want to update a GUI or whatever and don’t want to block waiting on a lock, like the default handler will do(which by default never runs unless you set a timeout).
So the default behaviour is try for a lock, once, if it doesn’t happen, return SQL_BUSY and give up. Waiting/retrying you have to ask for via the above method(s).
I’m saying sometimes sqlite3 must give up and is unable to wait even if you set a busy wait. Sometimes transactions get interleaved in ways that are not serializable. To avoid this you must use begin immediate to avoid any non serializable interleavings.
agreed. Though in theory you could set your timeout to be max int ms and never punt, or always return 1. I’m sure a user or three would get very upset with you eventually though :)
No I think you have misunderstood, I’m saying there are situations where busy wait won’t help no matter how high it is, sqlite3 must abort unless you structure your transactions correctly.
I think SQLite will detect these cases such as read lock upgrade to write and in these cases, it won’t trigger busy handler at all and just return SQLITE_BUSY. That has been said, I don’t know if there are any cases that SQLite cannot detect and will busy wait forever though.
I understood that. Sorry I was trying to make a joke, clearly I failed.
I’ve never seen a documented list of the places where it can fail in such situations, but I believe you that they exist, when you have multiple writers and don’t use BEGIN IMMEDIATE;.
It only happens if you grab read lock in a transaction before upgrade to a write lock. In many cases, it is fine to read without BEGIN on your writer thread, thus, only start transaction when you need to write in the writer thread. In this way, you don’t need to BEGIN IMMEDIATE from get go and you won’t have the failed to upgrade error.
Same. I’ve never considered it a toy. Not suited for multiple concurrent connections? Sure. But the usefulness of having a full SQL store for a small application is almost impossible to overstate.
I recommend getting a book on it. I have the O’Reilly one. (https://sqlite.org/books.html). I’ve used it so much in so many different scenarios. Such a great tool to have on the toolbelt.
SQLite has always been my go to database for testing and as a primary database for desktop projects that require one. It’s proven itself more than once.
There is a popular opinion among developers that SQLite is not suitable for the web, because it doesn’t support concurrent access. This is a myth. In the write-ahead log mode (available since long ago), there can be as many concurrent readers as you want. There can be only one concurrent writer, but often one is enough.
I wonder why SQLite3 developers choose to allow only one simultaneous write transaction. Does supporting multiple concurrent writers complexity the codebase or have other undesirable implications?
For many applications what you want to do is actually take the WAL-recommendation seriously and enable it.
PRAGMA journal_mode=WAL;
So much of “SQLite is slow” could be avoided by doing that. If you think you’d benefit from concurrent writes that’s most likely what you are looking for.
The problem with the inability to have concurrent writers is also a usability problem, since SQLite does not handle that for you, you need to architect the application so that there is never concurrent writes. An alternative would be to have a lock.
Yes. If you have two writers in a transaction and they conflict, one of them has to be able to undo partial bits of its transaction. That complicates the design of data structures and will make performance worse if you never have more than one writer. For most consumers of an embedded database, there is at most one thing writing (sometimes zero - I’ve seen sqlite used to provide an indexed data source that’s easy to update independently of the application build but never actually modified by the database).
As soon as you let multiple processes change database state, you need to worry about changes which affect the same row in different ways. You end up in eventual consistency land, or you do as sqlite3 does and ensure strong consistency with a single writer.
My understanding is, concurrent write requires independent rollback journal files for each transaction to implement correctly. However, SQLite’s rollback journal doesn’t support concurrency at all (single reader / writer). In SQLite’s WAL mode, concurrent readers are possible, but because they only have one write-ahead log file, it cannot have multiple write transactions (you cannot interleave write transactions in the WAL file). I wrote a little about this speculation in https://dflat.io/notes/mwmr/
Hey I am sort of okvs expert. If you need about programming let me know. Mind the fact that mdbx works primary from memory, for bigger than memory dataset, it might perform less well (check that claim). Also apparantly the API is not so good (check that claim too). Ping via lobsters messages :)
I wonder why SQLite3 developers choose to allow only one simultaneous write transaction.
It is much easier to implement isolation: there is never any write conflicts, hence snapshot isolation is trivial, among other things MVCC is not even necessary.
Something that is unclear is how fast a write transaction is seen by other transactions, in theory according to ACID rules it should be immediate… In any case, it seems there is a performance optimization opportunity to declare transaction read-only. There is also an opportunity to allow to read data from the data file lagging a little behind the WAL. It is also a simplification, when you accept you might read old data according the WAL, you do not need to keep around the WAL data in memory (or read the WAL at each transaction). LMDB has a read-only flag.
Last time I checked, in SQLite there is no builtin support to enforce a single writer, so the user needs to handle that. Since SQLite support multiple processes, it leads to more complication to synchronize multiple separate process, each of which would have several writers. Unlike a single process database, where is it easier and would be more performant. Mind the fact that PostgreSQL rely on multiple processes, but they are all forked from the same, my guess is that shared memory or whatever is easier to do in that setup.
The main app I’m working on supports both SQLite and PostgreSQL; for context, this is a web analytics web app. The main reason SQLite is supported is because it’s easier to set up and maintain for self-hosting.
In the very first versions SQLite was actually faster than PostgreSQL in my (basic) testing, although this is no longer the case as features have been added. I’m actually running some performance comparisons today; some thoughts:
Inserting the 10 million test pageviews for benchmarking takes a long time in SQLite; as in, about 24 hours. This will amounts to over 100 pageviews/second which is more than fast enough, but PostgreSQL takes about 2 hours for the same, and is about 10x faster.
PostgreSQL is generally faster in loading the dashboard, although by how much depends a bit on what kind of data you have. With 10M pageviews spread out over 1,000 paths the differences aren’t super-large: about 1s in PostgreSQL vs. 3s in SQLite, but when spread out over 10,000 paths it becomes a bit larger: about 3s for PostgreSQL vs 10s for SQLite.
These numbers are on my rather slow-ish laptop. Even a $10/month Linode VPS gave me better performance with PostgreSQL; these numbers are just a (preliminary) relative comparison.
I haven’t spent as much time optimizing SQLite as I have PostgreSQL, so there could be something to win here.
(Arguably SQL isn’t the best kind of backend storage for these kind of things in the first place, but it’s operationally a lot easier for people to run than some specialized thing)
You almost certainly want to set
journal_mode
towal
; this is not the default. It’s almost always faster, has better support for concurrency, and comes with essentially no real downsides.It can be really easy to allow concurrent writes if you’re not careful, which will land you in a world of trouble if you don’t set the journal mode. You really want to set it; this is not a “nice to have” but a “required” as far as I’m concerned.
The default cache_size is 2M; setting it to 20M (
-20000
) gave me much better performance. Setting it higher didn’t seem to matter much. The default seems fairly conservative.Database migrations in SQLite are a real pain, and error messages on check constraints extremely non-descriptive and annoying to track down. The former was partly fixed in 3.35 with
alter table drop column
, and the latter in 3.34.1 according to the release notes (I haven’t tried it yet, since the Go package is still at 3.34.0, so I’m not sure how much they’re improved).This would actually be a major reason I’d choose PostgreSQL for prototyping because it’s just easier to deal with. While
drop column
is nice, there are still many parts missing here. Doing the entire SQLite song-and-dance to alter a constraint or something is pretty time-consuming and annoying.The
sqlite3
CLI is kinda meh and thepsql
one is much nicer. There are many small reasons for this, and I think anyone who has used both will appreciate this. This, again, is an important reason I’d prefer PostgreSQL for prototyping.I really miss some of PostgreSQL’s features sometimes. Things like arrays are really nice. You can kind-of do the same in SQLite with JSON, but not really (for example
array_agg()
orunnest()
isn’t really something you can do).There are probably some other points I’m forgetting at the moment. My general experience with SQLite is good, and it’s absolutely not a “toy database”; overall it’s actually quite nice and fully-featured. I’d prefer to work with SQLite than MariaDB to be honest. I think MariaDB is not good software.
However, this:
Is a little bit more complicated, I think. The quote isn’t wrong, but it’s not just about performance (concurrent or otherwise). There are some good reasons to use SQLite, but most of them are operational (i.e. you don’t need to depend on a database server, but you will have to depend on
libsqlite
or a C compiler). Outside of that I think you will almost always have a better time with PostgreSQL to be honest. The main reason I support SQLite is so you can have a single static binary with no dependencies at all that can run pretty much anywhere, which is a rather nice property, and thus far SQLite-specific workarounds haven’t been painful enough to drop support for it.For non-web apps these operational advantages are much more pronounced, because who wants to run PostgreSQL to start Firefox? But if you’re already running a server for your SaaS or whatnot then you might as well run PostgreSQL; it’s not that hard. It’s just easier for folk who want to self-host and aren’t deeply invested.
You might already know what I’m about to say, but just in case – a common issue people end up hitting when bulk inserting in SQLite is that they don’t wrap the insert queries into a transaction, that is, each insert is comitted separately. When this happens, SQLite ends up going through the entire locking and then unlocking process for each one query. If using WAL mode with the default configuration, SQLite will run CHECKPOINT after each COMMIT. All of this put together, it can easily slow down the bulk insert process by one or two orders of magnitude. Also, make sure you are not preparing the query statement for each insert. This would slow things down as well.
At the same time, adding 10 million inserts in a single transaction will consume a lot of memory. Since you plan on doing some performance comparison today, I’d try batching my inserts every 10k and then 100k rows to see how it turns out.
Doesn’t using transactions containing multiple inserts also mean that SQLite only needs to iterate/update the indices for the affected tables once per commit rather than once per record?
Given the big performance insertion cost (100 pageview inserts per second seems very low), I think a lot of the work will be spent updating indices.
It actually does a lot more than just inserts, including some stuff outside of SQL. The entire process is a bit involved and not worth going in to detail here.
Also my laptop isn’t very fast and the SSD I used (a secondary SATA one, not the main NVMe) is an old one I got from a friend second-hand 8 years ago that I used as the main desktop drive for a few years. It’s probably near the end of its life, but useful for (temporary) bulk storage for stuff that doesn’t really matter. The actual real-world performance on something like a Linode VPS or AWS will be a lot better (just haven’t gotten ’round to running those tests yet).
The actual insert of the pageviews is already batched with
insert (..) values (..), (..), ...
, and I’m not sure if we can get more performance out of that. I think the big issue is that it also updates some secondary tables with anon conflict([..]) do update set [..]
, which is quite expensive. I tried wrapping the lot in a transaction just to be sure, but it didn’t seem to make much difference, if any.I’m not really bothered by this; it’s just a one-off I need to run once and the actual performance in normal usage is fine. If you’re racking up hundreds of pageviews/second you probably want to use PostgreSQL anyway. It was just intended a comparison how SQLite can be a lot slower. The original (much simpler) version managed to get ~800/s on SQLite and ~600/s with PostgreSQL, but a lot has changed since then.
I’m not sure if it actually runs a full checkpoint after every insert or how this works exactly? I was left with a 1G -wal file after everything was done anyway, and manually used
pragma wal_checkpoint
to commit it after I was done.Thanks for sharing that high quality high density comment.
I feel like a lot of the sqlite3 complaints w.r.t. concurrency stem from the fact that most people do not understand how to use immediate write transactions to avoid ‘database busy’ errors.
The key points being that you must set the busy timeout as well as using ‘begin immediate’ for write transactions to tell sqlite3 to immediately take a write lock. This means it won’t have to rollback halfway through a transaction when it fails to upgrade to a write lock.
This is an interesting point. As I understand it, you basically have 2 options:
and myBusyHandler will be called every time a write lock can’t happen. The handler can return 1 to try for a lock again, or 0 to give up and not try(and SQLite will then return a SQL_BUSY). i.e.
i.e. you define when to try for a lock by:
where it will try up to int ms to get a lock, or return SQL_BUSY if count ms is reached without getting a lock.
Basically the 1st option is if you want to update a GUI or whatever and don’t want to block waiting on a lock, like the default handler will do(which by default never runs unless you set a timeout).
So the default behaviour is try for a lock, once, if it doesn’t happen, return SQL_BUSY and give up. Waiting/retrying you have to ask for via the above method(s).
I’m saying sometimes sqlite3 must give up and is unable to wait even if you set a busy wait. Sometimes transactions get interleaved in ways that are not serializable. To avoid this you must use begin immediate to avoid any non serializable interleavings.
agreed. Though in theory you could set your timeout to be max int ms and never punt, or always return 1. I’m sure a user or three would get very upset with you eventually though :)
No I think you have misunderstood, I’m saying there are situations where busy wait won’t help no matter how high it is, sqlite3 must abort unless you structure your transactions correctly.
I think SQLite will detect these cases such as read lock upgrade to write and in these cases, it won’t trigger busy handler at all and just return SQLITE_BUSY. That has been said, I don’t know if there are any cases that SQLite cannot detect and will busy wait forever though.
I understood that. Sorry I was trying to make a joke, clearly I failed.
I’ve never seen a documented list of the places where it can fail in such situations, but I believe you that they exist, when you have multiple writers and don’t use BEGIN IMMEDIATE;.
Yeah, it seems really under documented for such a common cause of confusion.
It only happens if you grab read lock in a transaction before upgrade to a write lock. In many cases, it is fine to read without BEGIN on your writer thread, thus, only start transaction when you need to write in the writer thread. In this way, you don’t need to BEGIN IMMEDIATE from get go and you won’t have the failed to upgrade error.
Similar to your second option, you can also set the busy_timeout as a regular statement by using:
PRAGMA busy_timeout = 5000;
yes, a lot of options, like WAL which is also another basically must do default setting, are available as PRAGMA’s and not only via the sqlite API.
I must be out of sync here but people were thinking SQLite was a toy database??
Same. I’ve never considered it a toy. Not suited for multiple concurrent connections? Sure. But the usefulness of having a full SQL store for a small application is almost impossible to overstate.
What really blew me away years ago: SQLite is public domain! (https://www.sqlite.org/copyright.html)
I recommend getting a book on it. I have the O’Reilly one. (https://sqlite.org/books.html). I’ve used it so much in so many different scenarios. Such a great tool to have on the toolbelt.
SQLite has always been my go to database for testing and as a primary database for desktop projects that require one. It’s proven itself more than once.
I wonder why SQLite3 developers choose to allow only one simultaneous write transaction. Does supporting multiple concurrent writers complexity the codebase or have other undesirable implications?
For many applications what you want to do is actually take the WAL-recommendation seriously and enable it.
So much of “SQLite is slow” could be avoided by doing that. If you think you’d benefit from concurrent writes that’s most likely what you are looking for.
The problem with the inability to have concurrent writers is also a usability problem, since SQLite does not handle that for you, you need to architect the application so that there is never concurrent writes. An alternative would be to have a lock.
Yes. If you have two writers in a transaction and they conflict, one of them has to be able to undo partial bits of its transaction. That complicates the design of data structures and will make performance worse if you never have more than one writer. For most consumers of an embedded database, there is at most one thing writing (sometimes zero - I’ve seen sqlite used to provide an indexed data source that’s easy to update independently of the application build but never actually modified by the database).
almost read-only database :)
Seems like a good fit for things like RDF HDT
As soon as you let multiple processes change database state, you need to worry about changes which affect the same row in different ways. You end up in eventual consistency land, or you do as sqlite3 does and ensure strong consistency with a single writer.
My understanding is, concurrent write requires independent rollback journal files for each transaction to implement correctly. However, SQLite’s rollback journal doesn’t support concurrency at all (single reader / writer). In SQLite’s WAL mode, concurrent readers are possible, but because they only have one write-ahead log file, it cannot have multiple write transactions (you cannot interleave write transactions in the WAL file). I wrote a little about this speculation in https://dflat.io/notes/mwmr/
Hey I am sort of okvs expert. If you need about programming let me know. Mind the fact that mdbx works primary from memory, for bigger than memory dataset, it might perform less well (check that claim). Also apparantly the API is not so good (check that claim too). Ping via lobsters messages :)
What about rocksdb?
It is much easier to implement isolation: there is never any write conflicts, hence snapshot isolation is trivial, among other things MVCC is not even necessary.
Something that is unclear is how fast a write transaction is seen by other transactions, in theory according to ACID rules it should be immediate… In any case, it seems there is a performance optimization opportunity to declare transaction read-only. There is also an opportunity to allow to read data from the data file lagging a little behind the WAL. It is also a simplification, when you accept you might read old data according the WAL, you do not need to keep around the WAL data in memory (or read the WAL at each transaction). LMDB has a read-only flag.
Last time I checked, in SQLite there is no builtin support to enforce a single writer, so the user needs to handle that. Since SQLite support multiple processes, it leads to more complication to synchronize multiple separate process, each of which would have several writers. Unlike a single process database, where is it easier and would be more performant. Mind the fact that PostgreSQL rely on multiple processes, but they are all forked from the same, my guess is that shared memory or whatever is easier to do in that setup.
ref: about isolation in SQLite https://www.sqlite.org/isolation.html
Previous discussion: Avoid SQLite In Your Next Firefox Feature