I get what the author is trying to get at with calling it “serverless” and not sure if it’s a good or bad overloading of terms. But, I do think that SQLite is an underappreciated tool for the reasons they described. I wrote the following on Hacker News, but figured I’d add it here to:
I think a good under-appreciated use case for SQLite is as a build artifact of ETL processes/build processes/data pipelines. Seems like lot of people’s default, understandably, is to use JSON as the output and intermediate results, but if you use SQLite, you’d have all the benefits of SQL (indexes, joins, grouping, ordering, querying logic, and random access) and many of the benefits of JSON files (SQLite DBs are just files that are easy to copy, store, version, etc and don’t require a centralized service).
I’m not saying ALWAYS use SQLite for these cases, but in the right scenario it can simplify things significantly.
Another similar use case would be AI/ML models that require a bunch of data to operate (e.g. large random forests). If you store that data in Postgres, Mongo or Redis, it becomes hard to ship your model alongside with updated data sets. If you store the data in memory (e.g. if you just serialize your model after training it), it can be too large to fit in memory. SQLite (or other embedded database, like BerkleyDB) can give the best of both worlds– fast random access, low memory usage, and easy to ship.
I think SQLIte is great, and an amazing feat of engineering.
However, I really wish it would just check my types. If the database will happily write a string to my int column, and my language is dynamically typed… well, there’s only the fallible human left to ensure there’s no silent data corruption.
You can add check constraints using typeof, e.g. check(typeof(col) == 'INTEGER').
I agree static types are useful and important, but dynamic types are also useful for plenty of things, e.g. using SQLite with unclean data from external sources.
select typeof(col), count(*) from imported group by 1;
select col from imported where typeof(col) != 'INTEGER';
update imported set col = ... where typeof(col) != 'INTEGER';
It seems like the initial documentation might be older than the widespread usage of serveless as “no visible servers for you to manage”.
I think it would be a bit silly to choose this hill to die on, it is not like the older meaning of serveless has ever caught on in any way or form, nor there’s really a trend on building the sort of thing that could be called the sqlite kind of serveless.
The text on itself doesn’t mean that the author is choosing to die on this hill, though, maybe it’s just about clarifying a specific piece of documentation
This page is at least 12 years old, and remains largely unmodified since its creation, except for the second section added 2 years ago. See this archive of the page from 2007. No one is dying on any hill, it was just written long before the term was otherwise used.
“Serverless” here means literally what it says: the work is done in-process, not in a separate server. This is beyond a trend, it’s the way regular libraries work. ImageMagick is “serverless”. Berkeley DB is “serverless”. OpenGL is “serverless”. Get it?
The only reason the developer of SQLite calls this out is because most SQL databases are client-server, so someone familiar with MySQL or SQLServer might otherwise be confused.
(And may I add that I, personally, find the current meaning of “serverless” ridiculous. Just because you don’t have to configure or administer a server doesn’t mean there isn’t one. When I first came across this buzzword a few years ago, I thought the product was P2P until I dug into the docs. But then, a lot of buzzwords are ridiculous and we get used to them.)
I get what the author is trying to get at with calling it “serverless” and not sure if it’s a good or bad overloading of terms.
I’m sympathetic to this line of thinking, but in this case “serverless” is an utterly and completely lost cause. It’s beyond any hope of redemption. All use is fair play.
I think a good under-appreciated use case for SQLite is as a build artifact of ETL processes/build processes/data pipelines.
ha, I built pretty much exactly that at Etsy years ago. We had an ETL that transformed the output of hadoop jobs into sqlite files that could be queried from the site. It worked because without writers you don’t have any locking problems.
I get what the author is trying to get at with calling it “serverless” and not sure if it’s a good or bad overloading of terms. But, I do think that SQLite is an underappreciated tool for the reasons they described. I wrote the following on Hacker News, but figured I’d add it here to:
I think a good under-appreciated use case for SQLite is as a build artifact of ETL processes/build processes/data pipelines. Seems like lot of people’s default, understandably, is to use JSON as the output and intermediate results, but if you use SQLite, you’d have all the benefits of SQL (indexes, joins, grouping, ordering, querying logic, and random access) and many of the benefits of JSON files (SQLite DBs are just files that are easy to copy, store, version, etc and don’t require a centralized service).
I’m not saying ALWAYS use SQLite for these cases, but in the right scenario it can simplify things significantly.
Another similar use case would be AI/ML models that require a bunch of data to operate (e.g. large random forests). If you store that data in Postgres, Mongo or Redis, it becomes hard to ship your model alongside with updated data sets. If you store the data in memory (e.g. if you just serialize your model after training it), it can be too large to fit in memory. SQLite (or other embedded database, like BerkleyDB) can give the best of both worlds– fast random access, low memory usage, and easy to ship.
I think SQLIte is great, and an amazing feat of engineering.
However, I really wish it would just check my types. If the database will happily write a string to my int column, and my language is dynamically typed… well, there’s only the fallible human left to ensure there’s no silent data corruption.
You can add check constraints using typeof, e.g.
check(typeof(col) == 'INTEGER')
.I agree static types are useful and important, but dynamic types are also useful for plenty of things, e.g. using SQLite with unclean data from external sources.
It seems like the initial documentation might be older than the widespread usage of serveless as “no visible servers for you to manage”.
I think it would be a bit silly to choose this hill to die on, it is not like the older meaning of serveless has ever caught on in any way or form, nor there’s really a trend on building the sort of thing that could be called the sqlite kind of serveless.
The text on itself doesn’t mean that the author is choosing to die on this hill, though, maybe it’s just about clarifying a specific piece of documentation
This page is at least 12 years old, and remains largely unmodified since its creation, except for the second section added 2 years ago. See this archive of the page from 2007. No one is dying on any hill, it was just written long before the term was otherwise used.
“Serverless” here means literally what it says: the work is done in-process, not in a separate server. This is beyond a trend, it’s the way regular libraries work. ImageMagick is “serverless”. Berkeley DB is “serverless”. OpenGL is “serverless”. Get it?
The only reason the developer of SQLite calls this out is because most SQL databases are client-server, so someone familiar with MySQL or SQLServer might otherwise be confused.
(And may I add that I, personally, find the current meaning of “serverless” ridiculous. Just because you don’t have to configure or administer a server doesn’t mean there isn’t one. When I first came across this buzzword a few years ago, I thought the product was P2P until I dug into the docs. But then, a lot of buzzwords are ridiculous and we get used to them.)
I’m sympathetic to this line of thinking, but in this case “serverless” is an utterly and completely lost cause. It’s beyond any hope of redemption. All use is fair play.
ha, I built pretty much exactly that at Etsy years ago. We had an ETL that transformed the output of hadoop jobs into sqlite files that could be queried from the site. It worked because without writers you don’t have any locking problems.
Did anyone ever just ship the sqlite database to a user’s localstorage in a sensible use case? :)