Ironically, I think I’ve already run into this problem here on Lobsters. I tried to submit an interesting blog post about Smalltalk but was told the URL was invalid. The URL contains a seemingly autogenerated, non-optional alphanumeric ID that contains the word “sex” in it.
One solution is colocating the data at the edge itself. But a better approach is a database embedded in the edge runtime itself. With this, database latency becomes zero. This is the holy grail. This slaps.
Cloudflare Workers already achieves this but exposes a KV interface.
Limbo is a pre-existing FOSS project: it is the successor to the C programming language designed by the creators of the C programming language so it cannot reasonably be claimed to be obscure or unknown.
It is the language in which the successor to Unix by the designers of Unix, the Inferno OS, was mostly implemented.
It is a direct ancestor of Golang.
It has been in existence for nearly 30 years so.
This is, IMHO, inexcusable.
Phoenix the BIOS company forced Mozilla to rename Phoenix the web browser to Firebird. Then Firebird the FOSS database forced Mozilla to rename it again to Firefox.
This is the legal precedent: a BIOS and a database can’t be confused with a web browser, but they were held to be close enough.
A database in a type-safe low-level language is close enough to a type-safe low-level language.
This is a bad name. They should have checked. They should rename the project.
Unless I’m mistaken, the legal precedent you cite was based on trademarks. As I was saying (you might have missed it?), the “Limbo” trademark owned by Vita Nuova is not valid anymore.
Would you care to say more about your relationship with or feelings toward inferno, limbo and Vito Nuova?
Relationship? None at all. Can’t write it, never could, never tried. Only used Inferno for a few tens of minutes in a VM a decade ago. I wrote about it, here:
Codenames can stick. Mozilla was just an internal codename. Delphi was just an internal codename.
That means…
You chose your codename poorly.
That means…
You don’t know your tech history as well as I feel you should, and since not knowing tech history is a common cause of needlessly re-inventing wheels, that casts doubt in my mind about the wisdom of any and all other decisions to make, which I am not personally able to personally judge.
2 and 3 really seem needlessly inflammatory. “‘Codenames can stick’ means ‘you chose your codename poorly’” doesn’t follow at all, but you appear intent on whipping up a storm or something there. Chill out; the target of your ire is undeserving of it.
They are footnotes, but they are important footnotes, IMHO.
It is not entirely unfair to describe Limbo is “what the guys who invented C did next”; it’s also a direct ancestor of Golang. As an additional important consideration, Limbo followed the attempt to build Aleph on Plan 9, so it wasn’t some blue-sky clean-slate thing: this was a second try, informed by the failure of the first.
Why does this matter? Because why Aleph didn’t succeed but Limbo did, to some extent anyway, has lessons to teach. Limbo itself influenced Go, which is highly relevant as Go is a principle rival to Rust in the “let’s do a better C++” area.
Similarly, Inferno is what the Plan 9 turned into when it grew up, and Plan 9 is what “the guys who invented Unix did next.” There’s more to be learned from that than 100 commercial Unixes, or FOSS recreations of Unix.
This is important history. As Henry Spencer said, “Those who do not understand Unix are condemned to reinvent it, poorly.”
You need to know your history to avoid repeating it.
What does this teach us? The choice of name, even as a temporary codename, tells me that the people who chose the name don’t know their history. It’s like a new car company calling itself “Diesel”. I want to say: “Hey, guys, that name is used. No it’s not a trademark, but it’s kinda important. Yeah, I get it, you want to make EVs, you don’t use any fuel, but hey, you kinda ought to know more about cars.”
I don’t know anything useful about SQL or SQLite or any of it. I don’t really know anything about EVs either. But I can spot a clue, especially if that clue is a hint of something bad or at least embarrassing.
simonw’s too modest to post this himself but I learned a lot from his comments on the announcement. Particularly about the Antithesis testing approach, it’s really interesting.
With all the tools available to us to help with editing, I get frustrated reading typos in blog posts especially when they don’t leave a way to privately and politely give the author a head’s up. (The typo in the article would be remedied by s/hist/his/ )
If you don’t carefully proofread your words, how do I know if you carefully review your code?
There is a “Contact Us” link in the footer, which — to my delighted surprise — is a straight-up mailto: link. support@ doesn’t give me the hugest confidence, but it’s counterbalanced by the lack of an awful form and CAPTCHA.
Good eye, I was referring to the lack of an article byline or something like that where you can side-channel the author. Praise in public, criticize in private, etc. I reason contacting support on something like a typo in a blog post wouldn’t be the most appropriate.
Sorry if I came off coarse, I just see so many articles and blog posts anymore where proofreading or spell-checking didn’t happen, and it gets distracting as a reader.
I don’t expect anybody to write bug-free code or typo-free prose, nor did I suggest anyone unrealistically do so! It is why we have code-formatters, linters, and do code review. For prose, we can take the few seconds to pipe the text through a CLI-based spell-checker. Or paste the text into a word-processor with a built in spell-checker (and look for the red squiggles). Or even ask AI to review it for grammar or spelling errors.
For all the hard work and nice writing that it is, why not do one of these quick checks to make sure it is as good as it can be?
With the license and TH3 in hand I imagine it would become very easy to make large-scale refactors. Of course, with large-scale refactors the odds are very good that the changes would not land in upstream SQLite. So they would be forced to maintain their own fork anyway. And I don’t think a TH3 license is as expensive as the ongoing maintenance cost of an SQLite fork by highly experienced developers (who need some take-home salary). I may be wrong of course. But to me the TH3 license cost sounds like a red herring.
We tried to pay for it when we forked libSQL but never heard back… 🤷 I think doubling down on DST but also the open test suite (Limbo has the same TCL tester) is the way to go!
I mean … at least this particular one is kind of cute? But what on earth is it meant to be saying or conveying? Is this Limbo the cat? SQLite the cat? Why are they looking so concerned? What is the disc on their chest? Have they eaten recently?? Can I feed the cat?!?!
SQLite is like three guys. I am confident in those three guys ability to deliver safe and well tested C code. These Turso folks think SQLite would benefit from external contributions. I am less confident in the ability of new external contributors to do that.
SQLite doesn’t have a safety problem, libSQL does. So if you’re walking around going “gosh, I sure would love to switch from SQLite to libSQL, but I’m just not confident in its memory safety” then Limbo might be for you.
SQLite doesn’t have a safety problem, libSQL does. So if you’re walking around going “gosh, I sure would love to switch from SQLite to libSQL, but I’m just not confident in its memory safety” then Limbo might be for you.
That’s a very unfair take of the article. Article nowhere mentions that SQLit has a safety problem.
We are rewriting because we want to have AsyncIO, WASM first and support Deterministic Simulation Testing support from the beginning.
It was more of a logical sequence than an accusation. C code is often unsafe. SQLite devs have demonstrated they can produce safe code, so that isn’t an issue for it. libSQL intends to have many more contributors, so I wouldn’t feel comfortable making the same assumption.
While I’m not enthused at having another thing wanting to pull an async runtime into my sync projects, once I feel there’s been enough time for Limbo to have proven itself as trustworthy as SQLite, I look forward to using it.
SQLite’s dependence on a C toolchain for more than just libc makes statically linked musl-libc builds annoyingly inconvenient compared to a pure Rust codebase.
…maybe, by the time that happens, Rust will have something like keyword generics and Limbo will have adopted them enough that it can be used synchronously without hiding an async runtime behind the scenes.
I think you’re being unduly negative. Async IO does not require an async runtime (like tokio) or even async/await. I quickly searched for async in the limbo github repo and it does not look like they are using either.
Limbo is designed to be fully asynchronous. SQLite itself has a synchronous interface, meaning driver authors who want asynchronous behavior need to have the extra complication of using helper threads.
It doesn’t make sense for something to be async internally if downstreams depending on it still have to block on getting a result, so the absence of async fn suggests they either haven’t implemented their higher-level API for Rust consumers yet or they plan on some kind of ugly, NIHed alternative for Rust downstreams that do want to access it asynchronously.
Fair point… but not what I meant to be getting at in this specific case.
SQLite could already do that. When they say “SQLite itself has a synchronous interface, meaning driver authors who want asynchronous behavior need to have the extra complication of using helper threads.”, they’re talking about what external API SQLite exposes.
I’m not sure what you’re saying. You seem to be complaining that there’s an async interface in Limbo but you don’t want that, and I showed that you don’t need to - you can just block. Now you seem to be saying something else but I’m not sure what.
I’m saying that their stated intent makes no sense if they don’t either expose or plan to expose an async interface to consumers.
My assumption was that they already do, and given that it makes no sense to expose an async interface to downstream Rust without defining the API in terms of async fns, if you’re going to expose async fns, you might as well use async/await in your implementation. I admit I was wrong there.
Now I’m just hoping that, if/when they do offer and async interface, it’ll remain optional.
It’s an interesting claim. External contributors may not have the obsession with stable database formats that a small group dedicated to the cause might.
The right way to think about it is that SQLite would benefit from competition. Personally I believe that this competition would be better to come from a re-implementation than a fork. So I think this is good. Will it succeed? Will I like it? Will I use it? I mean probably the answer is: no, but I sure as hell will applaud that effort.
one of my biggest gripes with every sql based relational database I’ve ever used is the error messages are largely useless and haven’t seemed to have kept up with the development experience of other tooling in the industry (see: rust errors, graphql responses, gleam errors)
is this something the team is looking at with limbo? I reckon it would be a huge productivity booster, not just for newbies learning sql but experienced folks doing complex queries on complex schemas, I’d love to see something more than “there’s an error somewhere near xyz” but a deep explanation of context (table/column names, parameter values, constraint names, etc)
one of my biggest gripes with every sql based relational database I’ve ever used is the error messages are largely useless
OMG, this!
I’d love to see something more than “there’s an error somewhere near xyz” but a deep explanation of context (table/column names, parameter values, constraint names, etc)
And, more importantly, programmatic access to those in a structured way! That’s one of the things I missed when working on custom database interfaces (like ORMs and such). You’d have to resort to parsing error strings in most cases :S
I am currently working on dev tooling for sqlite specific linting / analysis. I am trying to get it somewhat similar to diagnostics modern programming languages generate (it doesn’t yet support the whole syntax sqlite understands):
Limbo is a research project to build a SQLite compatible in-process database in Rust with native async support
I’ll bite. What are the improvements over original SQLite (I’m unfamiliar with libSQL). Advertising “written in Rust” in post titles always draws attention and suggests “modern, memory-safe”, but I’d argue that well-writen C code (my go-to examples of rock-solid C are Postfix and Dovecot) is better than trusting that any random code written in Rust is safer.
Haven’t tested Limbo, so I’ll just ask about the “native async support” stuff mentioned in README as I’m usure if I’m reading proper docs (your site says “Turso - SQLite for Production. Powered by libSQL.” and this submission is about “Limbo: A complete rewrite of SQLite in Rust”). So… is this io_uring compatibility or some other improvement over SQLite?
Advertising “written in Rust” in post titles always draws attention […] but I’d argue that well-writen C code […] is better than trusting that any random code written in Rust is safer.
This argument feels like a strawman: nobody is claiming that just using Rust is enough to make “any random code” better than well-written code. Please respond to the article content, not whatever poor argument its clickbait-y title reminds you of.
The people behind Libsql and Limbo have given good reasons for both the fork and the rewrite, and for the choice of Rust. They have very solid QA plans for Limbo, and a good track record with Libsql. It’s not some random code by a new overoptimistic Rust convert.
but I’d argue that well-writen C code (my go-to examples of rock-solid C are Postfix and Dovecot) is better than trusting that any random code written in Rust is safer.
SQLite has several different test suites. I forget the exact details, but I think 2 of the 3 are OSS and 1 isn’t, but I could be wrong, it’s been a while.
EDIT: I read the article a bit harder, which mentions that the SQLite test suite is proprietary.
Isn’t this a somewhat usual licensing behavior when packing differently-licensed products? The suite (as a “set of tests”) can’t have a license overwriting the licenses of particular tests included within.
I’m not an expert around licenses, so… Does “proprietary” here mean “no SQLite-like products can use this test suite”?
I’d have to double-check, but my memory is that proprietary means “We run our MC/DC suite on our code. If you want access to it so you can run it on your patched version, you have to give us a bag of money”.
It just means they don’t release it — see the TH3 page. At the bottom:
6. TH3 License
SQLite itself is in the public domain and can be used for any purpose. But TH3 is proprietary and requires a license.
Even though open-source users do not have direct access to TH3, all users of SQLite benefit from TH3 indirectly since each version of SQLite is validated running TH3 on multiple platforms (Linux, Windows, WinRT, Mac, OpenBSD) prior to release. So anyone using an official release of SQLite can deploy their application with the confidence of knowing that it has been tested using TH3. They simply cannot rerun those tests themselves without purchasing a TH3 license.
Advertising “written in Rust” in post titles always draws attention and suggests “modern, memory-safe”, but I’d argue that well-writen C code (my go-to examples of rock-solid C are Postfix and Dovecot) is better than trusting that any random code written in Rust is safer.
Thank you. I think the world needs to realize this. I rekon the that community is delivering plenty of interesting products. But the silver bullet mentality will backfire catastrophically sooner or later.
I think the world needs to realize this […] the silver bullet mentality will backfire
I don’t think “the world” ever thought otherwise to begin with. The claim is that everything else being equal, Rust is more safe/correct/productive/etc than C, not that it’ll fix all bugs. I see lots of comments warning against that “silver bullet mentality”, but I rarely see it in the wild, and rarely ever from an actual developer. Don’t fight windmills.
The claim is that everything else being equal, Rust is more safe/correct/productive/etc than C
That claim makes little sense because everything else being equals is a huge assumption and ultimately the matter in discussion.
Indeed, looking at C vs C++, the C community (for lack of a better word) has much stronger track record when it comes to deliver reliable software that stood the test of time than C++. So there are certainly traits of the languages that produce distinct yield that are not related to memory safety.
I also disagree that silver bullet mentality is not prevalent. Every single Rust developer I met has claimed that Rust is a better language because its memory safe. I have not met many, but they all said this. Every last one of them.
But I re-iterate, there are lots of interesting software dömning out of Rust ecosystem. So there is something about it that is working. Perhaps memory safety attracts good developers? I don’t know.
I also disagree that silver bullet mentality is not prevalent. Every single Rust developer I met has claimed that Rust is a better language because its memory safe. I have not met many, but they all said this. Every last one of them.
A bizarre remark. Rust is actually a better language than C or C++ because it is memory safe. A language which prevents pernicious bugs is better than one which doesn’t. Recognizing this obvious fact isn’t a “silver bullet mentality.” A “silver bullet mentality” would be saying “because Rust is a better language, I do not need to worry about the quality of the software I produce.”
Rust is actually a better language than C or C++ because it is memory safe
That is true iff memory safety is the only axis upon which you measure goodness. And don’t get me wrong, it’s definitely a variable for my overall goodness(x…) function but it’s not the only one. “Ease of integrating with vendor-supplied C++ hardware interface library” is another one and from my initial experiments Rust did not score very well on that one.
Being 100% compatible with C++ means more or less adding a fully functional C++ compiler front end to D. Anecdotal evidence suggests that writing such is a minimum of a 10 man-year project, essentially making a D compiler with such capability unimplementable. Other languages looking to hook up to C++ face the same problem, and the solutions have been:
Support the COM interface (but that only works for Windows).
Laboriously construct a C wrapper around the C++ code.
Use an automated tool such as SWIG to construct a C wrapper.
To the first question: not at all! What I would say, though, holding a Chevrolet starter in my hand while fixing my Ford is “hmmm this doesn’t look like it’ll be a very good fit even though it’s a really nice looking starter”.
Re: transparent C++ interop, that’s not at all what we needed. We just needed a clear not-too-complex path. The main system I work on where we evaluated the possibility of using Rust has its core built (in C++) on the aforementioned vendor library for talking to a high-end camera, OpenCV, CUDA, TensorRT, and a couple of other C and C++ libraries. It’s on a hardware platform with Unified Memory so we can move buffers between the CPU and GPU by just setting some flags on the pages. When we evaluated Rust for this (admittedly 4 years ago), there didn’t yet seem like there was either a good OpenCV replacement or OpenCV bindings, and similar for CUDA and TensorRT bindings. We knew that it was probably going to be a bit of a hassle to put together a wrapper for the vendor camera library but relying on/maintaining/debugging alpha-quality bindings for a bunch of other vendor libraries just seemed like too much of a lift. In the long run, looking back, we’ve actually had very very few memory safety issues in this C++ project, primarily due to relying on more Modern C++ constructs like std::shared_ptr and friends. I’ve ended up using valgrind about once every 6 months to debug an issue that got missed.
For prototyping work we put together a very thin C shim around the vendor camera library that we use through the Python cffi package. It’s trivial to convert the raw frames from the camera into numpy arrays, which can then be used by the Python OpenCV package. It doesn’t expose most of the functionality that we need in production but it’s good enough for experiments.
To add to that, what axis is important depends on context. One project might value correctness higher than iteration speed, while the other flips that around. Right tool for the job and all that.
In other words, keep in mind that “better” is a relative term. No language is ever absolutely better than another, even if some comparisons are pretty clear-cut.
That’s quite alright, although I’d like to mention that your example (correctness vs. iteration speed) ignores the fact that debugging is a part of iteration, and stronger correctness guarantees reduce the amount of debugging, thus de-risking iteration, which has all sorts of follow-up effects.
As I wrote in my 2018 blog post “Corner Cutting vs. Productivity”, “[…] disallowing cut corners doesn’t need to harm productivity even in the short run. In the long run, being diligent obviously wins.”
Yes, Rust front-loads a lot of complexity in order to enable its safety guarantees. But it’s not dramatically more to learn than learning to write correct C (e.g. even in C, you still need to care about the lifetimes of your objects, only the C compiler won’t help you with that).
Finally, I’d like to link Lars Bergstrom’s RustNationUK ’24 talk in which he unveiled a study showing that Rust and Go developers at Google enjoy roughly twice the productivity of their C++ counterparts. And yes, C is not C++, but I doubt that the difference is as marked as that.
When I used “iteration speed” I had in mind an article from a game dev, who was arguing that this was the most painful aspect of Rust for his usecase (quickly play-testing many variations of, say, a weapon effect). It’s a different need and context than what Bergstrom (and you ?) are dealing with. FWIW, for me Rust is a very productive language.
Maybe we should think of “iteration speed” and “productivity” as two different axis. Or accept that a language’s score on one axis can differ between projects and teams.
quickly play-testing many variations of, say, a weapon effect
I like Rust, but I would define this in a configuration or scripting language (optimally with some kind of “refresh button”), not in Rust. If needed for performance, it could later be “hardcoded” into Rust.
100% agree. That’s been one of the nice things about working with the team I’ve been working with for the last 5 or 6 years: when we’re faced with a platform decision that doesn’t have an obvious clear-cut answer we sit down and actually do some analysis of the different axes that we’re trading off and score the different options (intentionally blinding ourselves from the pre-defined weights so that we don’t accidentally bias ourselves too much). There’s always at least 7 or 8 different axes to compare along within the context of the specific problem we’re trying to solve.
A language which prevents pernicious bugs is better than one which doesn’t. Recognizing this obvious fact.
No. That’s not a fact. That’s a blatant fallacy right there. That would only be true if all languages would be equal except for having memory safety or not. Which is obviously not true. You might argue that memory safety is a very important characteristic. That’s what’s up for debate, I won’t agree nor disagree. What’s a fact, is that memory safety is not the only characterisc of a language.
You’re twisting the argument. Nobody here (and probably none of the Rust devs you have met before) is saying that Rust’s safety makes it undeniably superior to unsafe languages. They’re just stating that safer is better, not that safety always trumps everything else.
Fully agree that the importance of various language characteristics varies, no language is universally better. Rust is a well-rounded language with many good characteristics, it’s not just for safety-critical stuff. If safety is sometimes brandished as the killer argument to justify the choice of Rust, it’s that the other characteristics compared well with other languages, or were not as important.
Every single Rust developer I met has claimed that Rust is a better language because its memory safe
I’m not really a Rust fan, but isn’t that the value proposition? Having a language “like C/C++” but without (as much?) UB and memory unsafety? Then technically, it’s a “better” language. At least, when compared to C++ since it’s such a large language. Language size factors into “goodness” as well, at least for me. It would be nice to have a small C-sized language with the benefits of Rust (but ideally, also a less steep learning curve). I’m certain lots of people who are unwilling to switch would be jumping up and down to get something like that.
“Everything else being equal” is of course an idealistic/theoretical situation. But you need to approach it to make any kind of comparison. Voytec’s comment, by pitting “well-written C code” against “random Rust code”, is comically far from a fair comparison. You may disagree whether “Rust is more safe/correct/productive/etc than C” but at least it’s a claim that we can productively discuss.
the C community […] has much stronger track record when it comes to deliver reliable software that stood the test of time than C++
To the extent that this may be true (citation needed), I would guess that this has more to do with what kind of projects C vs C++ are used for than with particular strengths and weaknesses of the languages.
I also disagree that silver bullet mentality is not prevalent. Every single Rust developer I met has claimed that Rust is a better language because its memory safe. I have not met many, but they all said this. Every last one of them.
I’m a Rust (amongst other languages) developer and don’t think that, so consider your streak broken ;) Less annecdotically, that’s really not the vibe I get from the community, which is generally enthusiastic but pragmatic. For example if you look at the numerous “why should I use Rust ?” questions on /r/rust, most commenters praise things like tooling before mentioning safety.
There’s probably subjectivity at play here. Maybe you’re confusing the argument “memory safety makes a language better” (IMHO a no-brainer) with “Rust is memory safe, and therefore the best language” (wrong for multiple reasons, and just trollish) ? Or you see somebody who is happy about their language choice for a project and wrongly assume that they think their language is perfect ?
Perhaps memory safety attracts good developers?
I really don’t think that’s Rust’s main draw. Plenty of memory safe languages out there, most without escape hatches. I think it’s better to look at unsafety as a repulsor (whatever the person’s skill) than at safety as an attractor. But many people come to Rust from memory-safe languages, not just from C/C++.
There’s no single aspect of a language that makes it successful. Rust ticks a lot of boxes and has good cross-feature synergies, it’s compelling as a whole. But it’s not some kind of ultimate language, it’s just a tool to get some jobs done.
I really don’t think that’s Rust’s main draw. Plenty of memory safe languages out there, most without escape hatches.
I can’t think of very many safe languages without escape hatches, actually. But you usually don’t reach for them as often in those languages as you might in Rust. On the other hand, as more and more libraries that use unsafe to provide something safe as written, it becomes easier to use Rust without ever touching unsafe, and those libraries may be treated as essentially equivalent to the unsafe inner workings of the implementations of higher level safe languages. I like to think of unsafe that way. If you want to add a new, safe to use primitive to Java, you write C++. If you want to add a new, safe to use primitive to Rust, you write Rust.
pitting “well-written C code” against “random Rust code”, is comically far from a fair comparison
Not so fast. C is straight to the point language with a barebones set of constructs,.small syntax and limited feature set. This lends itself to be more simpler and more pragmatic code. Personally I think this is a much bigger deal than most people realize.
Sorry, I don’t understand how your argument is related to the quote. My point was that you can write good or bad code in both languages, so if you want to compare languages you need to do so with code of similar quality.
As for your argument in favor of C, it’s a common but very subjective one. C might be simple, but it really isn’t easy to use. To write decent C, you need to hold a lot of complexity in your head (risk of UB, unclear APIs…). The facilities that C doesn’t provide (collections, error handling, sum types…) make it hard to write simple C code. Indeed, this is a much bigger deal than most C devs realize.
Rust is a more complex language, but it’s easier to use, review, maintain, onboard than C.
If you narrowly define safety to mean memory safety, and if the rust project doesn’t contain any unsafe code, then the rust project is guaranteed safer. But those are big ifs.
On Linux, Limbo uses io_uring, a performant API for asynchronous system calls.
Do you plan to support Linux systems where io_uring is unavailable? Because of security problems, Google disables or severely restricts io_uring on Android (and ChromeOS).
What non-Linux operating systems do you plan to support?
On a spectrum of “temporary limitation” to “design principle”, what is this?
If it’s a design principle: What is intended to happen if another process does modify the database? What measures is Limbo intended to take to defend against another process modifying the database?
While SQLite doesn’t make any real promises beyond “we’ll try our best but if your filesystem sucks that’s on you”[1], that does seem to be a slightly stronger limitation than SQLite’s. In practice, though, is that a super common use case or are you more worried about accidentally opening it with two processes?
I don’t have any big-picture view of how common it is, but, for example, the Nix package manager (which Limbo itself appears to use) has a central SQLite database, and it’s reasonable to expect that two processes might try to access that database simultaneously.
S3 provides 99.99999999% durability. But like any sane man, I would never trust an external system for data integrity.
This is correct. Storage durability is not the same as data integrity and there are several possible sources of read corruption, not all of which are inside the S3 protected boundary
Buggy network devices have been known to e.g. always clear bit 8 of byte 1320 in every packet they process AND recalculate the TCP checksum for the changed content.
A deep packet inspection device with bad RAM doing TLS interception will punch through those checksums too.
And of course your compute server could have bad RAM too, and by checking the hashes there you have a decent chance of catching all of these possibilities.
I took an easy path: used Foundation DB as my storage server. Foundation DB is quite amazing, it does meet the requirements: strongly consistent, fault-tolerant, and horizontally scalable. However, it also has limits on transaction size (10mb) and timeout (5s), which are not configurable.
Hopefully, one day, we will write our own storage server :)
Thanks, and I remember reading about FoundationDB being based off/inspired from SQLite but can’t find any reference to that. I must have dreamed it! Meanwhile I did find this https://github.com/losfair/mvsqlite – SQLite on FoundationDB!
I found this post by @hwayne pretty interesting, so submitted here. I am not sure if LinkedIn lets you see posts without logging in, I did check in incognito and it let me. But I am still copy pasting the content:
It’s not like they HAVE to: Julia, BASIC, Lua, Matlab, and TLA+ all start arrays at 1 (“1-indexed”).
And most of the “mother languages” (ALGOL-60, FORTRAN II, SIMULA, CLU, etc) were neither. Instead they had “index ranges”, meaning you could pick whether they started at 0 or 1 or -13 or 94.
(This is probably because most early programming was scientific computing, and different fields of math and physics have different indexing conventions. I’m not surprised the programming languages tried to be flexible.)
So we have this thing where historically, most languages had index ranges, and over time we got to the present, where most languages are 0-indexed and a tiny minority are 1-indexed. What happened?
It’s hard to say. In 1982 Dijkstra argued that 0-indexing was innately superior (EWD831) based on user experience. But this comes fairly late in the shift, which was already happening in the 70’s.
I think a more likely explanation is “mechanical sympathy”. Early CS texts I found that focused on low-level or “hardware” implementations of things strongly preferred 0-indexing. One mother language, APL, let you choose between 0 or 1-indexing. The author used 1-indexing for “high level” math and 0-indexing for “microprogramming”, his words for low-level implementations.
“Mechanical sympathy” also explains BCPL. BCPL was a “barely more than machine code” language that implemented arrays as pointers to a block of memory. arr[n] was syntactic sugar for *(arr+n). So the start of the address block was *arr, aka *(arr+0), so the first index of the array was arr[0].
BCPL would later inspire B, which would inspire C. C would profoundly influence all modern languages. So arguably BCPL did a lot to standardize 0-indexing, even if it wasn’t the sole force in its favor.
Nowadays most languages don’t need to be so close to the hardware, so there’s less incentive to make 0-indexing. At the same time, I regularly program in both 0- and 1-indexed languages, and find that a lot of array algorithms are a lot easier to implement in 0-indexing.
I’ve also worked with modern “index range” languages. They make local scripting a lot more pleasant, but they also make integrating libraries and other people’s code harder— you don’t know what convention they used. I suspect that’s why they fell out of popularity in the first place.
Overall, I’d say that the popularity of 0-indexing is historically contingent on the constraints of early hardware, but probably was the best outcome in the long run.
BCPL would later inspire B, which would inspire C. C would profoundly influence all modern languages. So arguably BCPL did a lot to standardize 0-indexing, even if it wasn’t the sole force in its favor.
How early into C’s life did it even have array operators? I vaguely remember reading it only had pointer arithmetic for a while and 0 indexing was because it was seen as sugar on top of pointer arithmetic with the index representing how many increments to the pointer.
I could be wildly wrong as this is just a vague memory of reading something years ago.
I also learned both through experience and advice from someone that worked with RocksDB at Meta that column families are generally to be avoided. The same kind of namespacing can be instead accomplished using prefixes without the performance hit.
As for the comparator issues, I think using something like Flatbuffers would help but that would be a larger change as it involves changing out those structs.
Something I hadn’t considered before reading this is that the SQLite global write lock applies on a per-database basis… so you can increase your concurrent writes by spreading them across multiple database files, at which point you’re limited by CPU cores and the number of files you can open at once.
SQLite lets you attach up to ten database files to the same connection and supports cross-database joins, so you could even spread writes across ten copies of the same table (each in a separate DB) and then run queries against the UNION of them all.
Could be fun to benchmark some of these patterns and see how the writes-per-second could go.
SQLite lets you attach up to ten database files to the same connection and supports cross-database joins, so you could even spread writes across ten copies of the same table (each in a separate DB) and then run queries against the UNION of them all.
Yes, they do, but I believe in WAL mode, there is a chance that on failure, you might end up in a partially committed state, where your changes could have been applied to some of the databases but not all of them.
Never ending up with partially committed state is what defines a transaction, so your answer should be “cross database transactions are not supported in wal mode.”
I do want to play around with attaching databases as well, but if you are running into write performance issues in your web app, you are probably better off switching to Postgres than messing around with multiple database files.
That said, I have written Go code that rolled over from one database file to the next for a logging system, and I had a lot of fun doing it :D
I wonder how expensive a PostgreSQL setup for that kind of write performance might be? Handling 10,000+ writes per second might get pretty pricey with a server-based DB, could be SQLite with weird tricks ends up a lot cheaper to run.
In my experience people using PostgreSQL tend to go with managed solutions like RDS. I don’t know how it would work out if you compared the cost of 10,000+ writes/second on PostgreSQL on a VPS with the same performance from SQLite on a similar VPS.
A lot of the managed PostgreSQL services are really aimed at people with a lot of data. The Azure ones, for example, reserve an entire VM for the RDBMS and charges you for the storage and compute that it provides. It looks as if RDS is similar.
In any situation where SQLite would be considered, I think that’s almost certainly overkill. I moved from SQLite to PostgreSQL on a small VM for NextCloud, for example, and everything became noticeably more responsive but the postgres process is using <1% of CPU when multiple php-fpm processes are all using 10-25%. Even the cheapest managed offering would be an order of magnitude more CPU than this needs.
I did some benchmarking to see how SQLite performs compared to Postgres and MySQL all of them running on the same VPS. I didn’t play around with multiple databases in SQLite but the benchmark showed that all the databases have similar throughput. I’ll write a separate blog post about it, but the gist was that for read requests SQLite was the fastest for about 5-10%, and for write only SQLite was the slowest by about 30%. This was on a 4 VCPU hetzner machine (20eu/month).
BEGIN IMMEDIATE will lock your database for longer than a regular BEGIN, so having a longer BUSY_TIMEOUT to offset this can help reduce database is locked errors. Also, without BEGIN IMMEDIATE you can get database is locked errors immediately, not respecting BUSY_TIMEOUT at all, which can be very confusing!
Also, without BEGIN IMMEDIATE you can get database is locked errors immediately, not respecting BUSY_TIMEOUT at all, which can be very confusing!
this is confusing lol.
my understanding was: if you use BUSY_TIMEOUT alone, then the write can wait for timeout time or else return with database locked error
if you use BEGIN IMMEDIATE, if a write is already progressing then you will immediately get database locked error rendering BUSY_TIMEOUT useless.
If some other database connection has already modified the database or is already in the process of modifying the database, then upgrading to a write transaction is not possible and the write statement will fail with SQLITE_BUSY.
IMMEDIATE causes the database connection to start a new write immediately, without waiting for a write statement. The BEGIN IMMEDIATE might fail with SQLITE_BUSY if another write transaction is already active on another database connection.
This is super confusing, and the naming don’t help either!
With BUSY_TIMEOUT alone you get the SQLITE_BUSY error immediately if the read lock cannot be upgraded into a write lock. In this case, waiting for BUSY_TIMEOUT doesn’t make sense, since the other thread with the write lock could change the data and invalidate the read that we already made, thus violating transaction serializability.
Example:
BEGIN;
SELECT foo FROM bar; -- READ LOCK
INSERT INTO bar; -- WRITE LOCK, but if the db is already locked, this will fail with SQLITE_BUSY without BUSY_TIMEOUT.
BEGIN IMMEDIATE acquires the write lock immediately so you are never in the state where you need to upgrade a read lock to a write lock.
Example:
BEGIN IMMEDIATE; -- WRITE LOCK, if the db is already locked we can safely wait until BUSY_TIMEOUT.
SELECT foo from bar;
INSERT INTO bar; -- All good, since begin immediate already acquired the db lock.
So the point is that BEGIN IMMEDIATE protects you from SQLITE_BUSY without waiting until BUSY_TIMEOUT.
Yes, you will have higher throughput without BEGIN IMMEDIATE - the write locks will be shorter and if the read lock can’t be upgraded into a write lock SQLite will just give up immediately, so also less threads waiting to acquire a lock.
But the problem is that you now have to handle these errors in your application code, because now the first write after a read in a transaction can randomly fail - even if you only have one other write going on at the time. This is quite a common scenario for web applications, so you’ll end up seeing a bunch of SQLITE_BUSY errors due to this, even if your site isn’t getting a lot of traffic. BEGIN IMMEDIATE solves this, since your transactions will always wait for BUSY_TIMEOUT before failing. With BEGIN IMMEDIATE you’ll only see SQLITE_BUSY errors if your site is under actual load that prevented SQLite from acquiring a lock in the specified timeout.
I think of them not as random failures but as part of MVCC. In a non-immediate transaction you’re doing a sort of optimistic read-modify-write loop, where you have to retry if another txn snuck in and wrote before you did.
That said, I do almost always use immediate txns because they’re easier and, in a single-user app, write concurrency is less important.
I agree with you 100%. The issue is that if you use a framework like Django and don’t know much about SQLite internals, you don’t expect the code in your transaction to fail, at least not without first exceeding the busy wait timeout. None of the other databases work like this and SQLite’s behavior will feel random and confusing to devs or at least that’s how it felt to me before I figured out how IMMEDIATE transactions work.
Ironically, I think I’ve already run into this problem here on Lobsters. I tried to submit an interesting blog post about Smalltalk but was told the URL was invalid. The URL contains a seemingly autogenerated, non-optional alphanumeric ID that contains the word “sex” in it.
lol. I had no idea we had such filters
Any such filter should be visible in the source.
I tried again and I must have been mistaken. I’m guessing I put the URL where the title goes and vice versa. Whoops!
Brave still does, no?
Cloudflare Workers has supported embedded SQLite databases since September https://blog.cloudflare.com/sqlite-in-durable-objects/
they do! The paper came in Apr 2024. I will add a note in the blog
This is a bad choice of name.
Limbo is a pre-existing FOSS project: it is the successor to the C programming language designed by the creators of the C programming language so it cannot reasonably be claimed to be obscure or unknown.
https://en.wikipedia.org/wiki/Limbo_(programming_language)
https://www.vitanuova.com/inferno/limbo.html
It is the language in which the successor to Unix by the designers of Unix, the Inferno OS, was mostly implemented.
It is a direct ancestor of Golang.
It has been in existence for nearly 30 years so.
This is, IMHO, inexcusable.
Phoenix the BIOS company forced Mozilla to rename Phoenix the web browser to Firebird. Then Firebird the FOSS database forced Mozilla to rename it again to Firefox.
This is the legal precedent: a BIOS and a database can’t be confused with a web browser, but they were held to be close enough.
A database in a type-safe low-level language is close enough to a type-safe low-level language.
This is a bad name. They should have checked. They should rename the project.
I think it could only be bad from a legal perspective. Otherwise, everybody interested knows how to differentiate the two projects.
And actually, the trademark for Limbo appears to be dead/cancelled, so… not a problem anymore.
Is that not enough?
Unless I’m mistaken, the legal precedent you cite was based on trademarks. As I was saying (you might have missed it?), the “Limbo” trademark owned by Vita Nuova is not valid anymore.
So, it seems to me, what you are really saying is:
“It is perfectly fine to do something that’s wrong – use someone else’s name – as long as they will not be able to prosecute you for it.”
Do I have that right?
It seems to me that “it could only be bad from a legal perspective” is proposing that it’s not “wrong” from a moral perspective.
You seem really, really invested in this. Would you care to say more about your relationship with or feelings toward inferno, limbo and Vito Nuova?
Relationship? None at all. Can’t write it, never could, never tried. Only used Inferno for a few tens of minutes in a VM a decade ago. I wrote about it, here:
https://www.theregister.com/Print/2013/11/01/25_alternative_pc_operating_systems/
However I did do a talk about Plan 9 at FOSDEM this year:
https://archive.fosdem.org/2024/schedule/event/fosdem-2024-3095-one-way-forward-finding-a-path-to-what-comes-after-unix/
I got 4 or 5 articles out of that.
https://www.theregister.com/Tag/One%20Way%20Forward/
(One of them is based on an old FOSDEM talk from 4 or so years ago.)
I have written about Plan 9 a few times:
https://www.theregister.com/2022/11/02/plan_9_fork_9front/
https://www.theregister.com/2023/12/01/9front_humanbiologics/
https://www.theregister.com/2024/05/07/9front_do_not_install/
I don’t use it myself but I find the ideas fascinating.
Obviously not.
Limbo is a temporary name. I thought we covered in the blog post:
I have 3 levels of objection to that.
That means…
That means…
2 and 3 really seem needlessly inflammatory. “‘Codenames can stick’ means ‘you chose your codename poorly’” doesn’t follow at all, but you appear intent on whipping up a storm or something there. Chill out; the target of your ire is undeserving of it.
OK, fair enough.
Ok but limbo and inferno are both little more than historical footnotes.
Sadly, yes, you’re right.
But there are two significant points here:
It is not entirely unfair to describe Limbo is “what the guys who invented C did next”; it’s also a direct ancestor of Golang. As an additional important consideration, Limbo followed the attempt to build Aleph on Plan 9, so it wasn’t some blue-sky clean-slate thing: this was a second try, informed by the failure of the first.
Why does this matter? Because why Aleph didn’t succeed but Limbo did, to some extent anyway, has lessons to teach. Limbo itself influenced Go, which is highly relevant as Go is a principle rival to Rust in the “let’s do a better C++” area.
Similarly, Inferno is what the Plan 9 turned into when it grew up, and Plan 9 is what “the guys who invented Unix did next.” There’s more to be learned from that than 100 commercial Unixes, or FOSS recreations of Unix.
This is important history. As Henry Spencer said, “Those who do not understand Unix are condemned to reinvent it, poorly.”
You need to know your history to avoid repeating it.
I don’t know anything useful about SQL or SQLite or any of it. I don’t really know anything about EVs either. But I can spot a clue, especially if that clue is a hint of something bad or at least embarrassing.
P.S. I belatedly thought about the other Diesel because I know less about fashion than cars, but I think it doesn’t weaken my point.
Your point is weak because diesel (the fuel type) is actively used. Names that have long fallen out of use, on the other hand, will be reused.
Fair point.
I think the word Limbo goes back to ~1300AD. :-) It definitely has a lot of prior art…
Huh! TIL “limbo” and “limp” are etymologically related! (something at the edge / loosely hanging off …) Very interesting. :)
Why big ai picture
we couldn’t find a real cat before the launch
so sad, where are all the cats
simonw’s too modest to post this himself but I learned a lot from his comments on the announcement. Particularly about the Antithesis testing approach, it’s really interesting.
Exactly! Antithesis combined with DST will make Limbo reach the rigorous testing standards of SQLite.
With all the tools available to us to help with editing, I get frustrated reading typos in blog posts especially when they don’t leave a way to privately and politely give the author a head’s up. (The typo in the article would be remedied by
s/hist/his/)If you don’t carefully proofread your words, how do I know if you carefully review your code?
There is a “Contact Us” link in the footer, which — to my delighted surprise — is a straight-up mailto: link.
support@doesn’t give me the hugest confidence, but it’s counterbalanced by the lack of an awful form and CAPTCHA.Good eye, I was referring to the lack of an article byline or something like that where you can side-channel the author. Praise in public, criticize in private, etc. I reason contacting support on something like a typo in a blog post wouldn’t be the most appropriate.
I will fix the typo soon. Thanks for spotting!
it’s like asking how do you write bug free code. Mistakes happen!
Sorry if I came off coarse, I just see so many articles and blog posts anymore where proofreading or spell-checking didn’t happen, and it gets distracting as a reader.
I don’t expect anybody to write bug-free code or typo-free prose, nor did I suggest anyone unrealistically do so! It is why we have code-formatters, linters, and do code review. For prose, we can take the few seconds to pipe the text through a CLI-based spell-checker. Or paste the text into a word-processor with a built in spell-checker (and look for the red squiggles). Or even ask AI to review it for grammar or spelling errors.
For all the hard work and nice writing that it is, why not do one of these quick checks to make sure it is as good as it can be?
Unless one purchases a license: https://www.sqlite.org/th3.html#th3_license
With the license and TH3 in hand I imagine it would become very easy to make large-scale refactors. Of course, with large-scale refactors the odds are very good that the changes would not land in upstream SQLite. So they would be forced to maintain their own fork anyway. And I don’t think a TH3 license is as expensive as the ongoing maintenance cost of an SQLite fork by highly experienced developers (who need some take-home salary). I may be wrong of course. But to me the TH3 license cost sounds like a red herring.
https://x.com/penberg/status/1866516861271396579
Does every project need to slap some AI generated image on it now?
Yeah, honestly would prefer a random image of a hotdog than some AI generated slop
You might be forgetting about “hotdog or not hotdog.” You can’t escape AI with hotdog.
It’s over, the AI bros have infected the holy hotdog.
Beefy Miracle!
I mean … at least this particular one is kind of cute? But what on earth is it meant to be saying or conveying? Is this Limbo the cat? SQLite the cat? Why are they looking so concerned? What is the disc on their chest? Have they eaten recently?? Can I feed the cat?!?!
it’s not just any image, it’s the project’s official image - https://github.com/tursodatabase/limbo
OK but (please see all questions above)!!
ha ha, I don’t have the answers myself. But I hope we can feed the cat xD
SQLite is like three guys. I am confident in those three guys ability to deliver safe and well tested C code. These Turso folks think SQLite would benefit from external contributions. I am less confident in the ability of new external contributors to do that.
SQLite doesn’t have a safety problem, libSQL does. So if you’re walking around going “gosh, I sure would love to switch from SQLite to libSQL, but I’m just not confident in its memory safety” then Limbo might be for you.
I guess. I dunno. I’m just some guy.
That’s a very unfair take of the article. Article nowhere mentions that SQLit has a safety problem.
We are rewriting because we want to have AsyncIO, WASM first and support Deterministic Simulation Testing support from the beginning.
It was more of a logical sequence than an accusation. C code is often unsafe. SQLite devs have demonstrated they can produce safe code, so that isn’t an issue for it. libSQL intends to have many more contributors, so I wouldn’t feel comfortable making the same assumption.
While I’m not enthused at having another thing wanting to pull an async runtime into my sync projects, once I feel there’s been enough time for Limbo to have proven itself as trustworthy as SQLite, I look forward to using it.
SQLite’s dependence on a C toolchain for more than just libc makes statically linked musl-libc builds annoyingly inconvenient compared to a pure Rust codebase.
…maybe, by the time that happens, Rust will have something like keyword generics and Limbo will have adopted them enough that it can be used synchronously without hiding an async runtime behind the scenes.
I think you’re being unduly negative. Async IO does not require an async runtime (like tokio) or even async/await. I quickly searched for
asyncin the limbo github repo and it does not look like they are using either.I was working from lines like this.
It doesn’t make sense for something to be async internally if downstreams depending on it still have to block on getting a result, so the absence of
async fnsuggests they either haven’t implemented their higher-level API for Rust consumers yet or they plan on some kind of ugly, NIHed alternative for Rust downstreams that do want to access it asynchronously.Sure it does. A library internally could do something like:
And then the caller could just block on the two futures synchronously while internally it’s polled concurrently.
Fair point… but not what I meant to be getting at in this specific case.
SQLite could already do that. When they say “SQLite itself has a synchronous interface, meaning driver authors who want asynchronous behavior need to have the extra complication of using helper threads.”, they’re talking about what external API SQLite exposes.
I’m not sure what you’re saying. You seem to be complaining that there’s an async interface in Limbo but you don’t want that, and I showed that you don’t need to - you can just block. Now you seem to be saying something else but I’m not sure what.
I’m saying that their stated intent makes no sense if they don’t either expose or plan to expose an async interface to consumers.
My assumption was that they already do, and given that it makes no sense to expose an async interface to downstream Rust without defining the API in terms of
async fns, if you’re going to exposeasync fns, you might as well useasync/awaitin your implementation. I admit I was wrong there.Now I’m just hoping that, if/when they do offer and
asyncinterface, it’ll remain optional.It’s an interesting claim. External contributors may not have the obsession with stable database formats that a small group dedicated to the cause might.
The right way to think about it is that SQLite would benefit from competition. Personally I believe that this competition would be better to come from a re-implementation than a fork. So I think this is good. Will it succeed? Will I like it? Will I use it? I mean probably the answer is: no, but I sure as hell will applaud that effort.
disclosure: I work here. I am happy to answer any questions
repository: https://github.com/tursodatabase/limbo
one of my biggest gripes with every sql based relational database I’ve ever used is the error messages are largely useless and haven’t seemed to have kept up with the development experience of other tooling in the industry (see: rust errors, graphql responses, gleam errors)
is this something the team is looking at with limbo? I reckon it would be a huge productivity booster, not just for newbies learning sql but experienced folks doing complex queries on complex schemas, I’d love to see something more than “there’s an error somewhere near xyz” but a deep explanation of context (table/column names, parameter values, constraint names, etc)
OMG, this!
And, more importantly, programmatic access to those in a structured way! That’s one of the things I missed when working on custom database interfaces (like ORMs and such). You’d have to resort to parsing error strings in most cases :S
I am currently working on dev tooling for sqlite specific linting / analysis. I am trying to get it somewhat similar to diagnostics modern programming languages generate (it doesn’t yet support the whole syntax sqlite understands):
https://github.com/xNaCly/sqleibniz/tree/master
I’ll bite. What are the improvements over original SQLite (I’m unfamiliar with libSQL). Advertising “written in Rust” in post titles always draws attention and suggests “modern, memory-safe”, but I’d argue that well-writen C code (my go-to examples of rock-solid C are Postfix and Dovecot) is better than trusting that any random code written in Rust is safer.
Haven’t tested Limbo, so I’ll just ask about the “native async support” stuff mentioned in README as I’m usure if I’m reading proper docs (your site says “Turso - SQLite for Production. Powered by libSQL.” and this submission is about “Limbo: A complete rewrite of SQLite in Rust”). So… is this
io_uringcompatibility or some other improvement over SQLite?This argument feels like a strawman: nobody is claiming that just using Rust is enough to make “any random code” better than well-written code. Please respond to the article content, not whatever poor argument its clickbait-y title reminds you of.
The people behind Libsql and Limbo have given good reasons for both the fork and the rewrite, and for the choice of Rust. They have very solid QA plans for Limbo, and a good track record with Libsql. It’s not some random code by a new overoptimistic Rust convert.
I mean… I’d add SQLite to that list as well :) https://www.sqlite.org/testing.html
This brings up another question for @av – does Limbo attempt to use SQLite’s famously thorough test suite?EDIT: I read the article a bit harder, which mentions that the SQLite test suite is proprietary.
SQLite has several different test suites. I forget the exact details, but I think 2 of the 3 are OSS and 1 isn’t, but I could be wrong, it’s been a while.
Isn’t this a somewhat usual licensing behavior when packing differently-licensed products? The suite (as a “set of tests”) can’t have a license overwriting the licenses of particular tests included within.
I’m not an expert around licenses, so… Does “proprietary” here mean “no SQLite-like products can use this test suite”?
I’d have to double-check, but my memory is that proprietary means “We run our MC/DC suite on our code. If you want access to it so you can run it on your patched version, you have to give us a bag of money”.
It just means they don’t release it — see the TH3 page. At the bottom:
Thank you. I think the world needs to realize this. I rekon the that community is delivering plenty of interesting products. But the silver bullet mentality will backfire catastrophically sooner or later.
I don’t think “the world” ever thought otherwise to begin with. The claim is that everything else being equal, Rust is more safe/correct/productive/etc than C, not that it’ll fix all bugs. I see lots of comments warning against that “silver bullet mentality”, but I rarely see it in the wild, and rarely ever from an actual developer. Don’t fight windmills.
That claim makes little sense because everything else being equals is a huge assumption and ultimately the matter in discussion. Indeed, looking at C vs C++, the C community (for lack of a better word) has much stronger track record when it comes to deliver reliable software that stood the test of time than C++. So there are certainly traits of the languages that produce distinct yield that are not related to memory safety.
I also disagree that silver bullet mentality is not prevalent. Every single Rust developer I met has claimed that Rust is a better language because its memory safe. I have not met many, but they all said this. Every last one of them.
But I re-iterate, there are lots of interesting software dömning out of Rust ecosystem. So there is something about it that is working. Perhaps memory safety attracts good developers? I don’t know.
A bizarre remark. Rust is actually a better language than C or C++ because it is memory safe. A language which prevents pernicious bugs is better than one which doesn’t. Recognizing this obvious fact isn’t a “silver bullet mentality.” A “silver bullet mentality” would be saying “because Rust is a better language, I do not need to worry about the quality of the software I produce.”
That is true iff memory safety is the only axis upon which you measure goodness. And don’t get me wrong, it’s definitely a variable for my overall
goodness(x…)function but it’s not the only one. “Ease of integrating with vendor-supplied C++ hardware interface library” is another one and from my initial experiments Rust did not score very well on that one.Do qualify Chevrolet as “less good” if a replacement piece for one of their cars doesn’t fit in a Ford?
Is there any language besides Swift or Carbon that has transparent C++ interop?
Not sure what you mean by „transparent“ but D is also quite good.
I mean “can you use a Swift type on a C++ Vector and vice-versa?”
The D documentation presents the problem quite nicely:
To the first question: not at all! What I would say, though, holding a Chevrolet starter in my hand while fixing my Ford is “hmmm this doesn’t look like it’ll be a very good fit even though it’s a really nice looking starter”.
Re: transparent C++ interop, that’s not at all what we needed. We just needed a clear not-too-complex path. The main system I work on where we evaluated the possibility of using Rust has its core built (in C++) on the aforementioned vendor library for talking to a high-end camera, OpenCV, CUDA, TensorRT, and a couple of other C and C++ libraries. It’s on a hardware platform with Unified Memory so we can move buffers between the CPU and GPU by just setting some flags on the pages. When we evaluated Rust for this (admittedly 4 years ago), there didn’t yet seem like there was either a good OpenCV replacement or OpenCV bindings, and similar for CUDA and TensorRT bindings. We knew that it was probably going to be a bit of a hassle to put together a wrapper for the vendor camera library but relying on/maintaining/debugging alpha-quality bindings for a bunch of other vendor libraries just seemed like too much of a lift. In the long run, looking back, we’ve actually had very very few memory safety issues in this C++ project, primarily due to relying on more Modern C++ constructs like
std::shared_ptrand friends. I’ve ended up usingvalgrindabout once every 6 months to debug an issue that got missed.For prototyping work we put together a very thin C shim around the vendor camera library that we use through the Python
cffipackage. It’s trivial to convert the raw frames from the camera intonumpyarrays, which can then be used by the Python OpenCV package. It doesn’t expose most of the functionality that we need in production but it’s good enough for experiments.To add to that, what axis is important depends on context. One project might value correctness higher than iteration speed, while the other flips that around. Right tool for the job and all that.
In other words, keep in mind that “better” is a relative term. No language is ever absolutely better than another, even if some comparisons are pretty clear-cut.
That’s quite alright, although I’d like to mention that your example (correctness vs. iteration speed) ignores the fact that debugging is a part of iteration, and stronger correctness guarantees reduce the amount of debugging, thus de-risking iteration, which has all sorts of follow-up effects.
As I wrote in my 2018 blog post “Corner Cutting vs. Productivity”, “[…] disallowing cut corners doesn’t need to harm productivity even in the short run. In the long run, being diligent obviously wins.”
Yes, Rust front-loads a lot of complexity in order to enable its safety guarantees. But it’s not dramatically more to learn than learning to write correct C (e.g. even in C, you still need to care about the lifetimes of your objects, only the C compiler won’t help you with that).
Finally, I’d like to link Lars Bergstrom’s RustNationUK ’24 talk in which he unveiled a study showing that Rust and Go developers at Google enjoy roughly twice the productivity of their C++ counterparts. And yes, C is not C++, but I doubt that the difference is as marked as that.
When I used “iteration speed” I had in mind an article from a game dev, who was arguing that this was the most painful aspect of Rust for his usecase (quickly play-testing many variations of, say, a weapon effect). It’s a different need and context than what Bergstrom (and you ?) are dealing with. FWIW, for me Rust is a very productive language.
Maybe we should think of “iteration speed” and “productivity” as two different axis. Or accept that a language’s score on one axis can differ between projects and teams.
I like Rust, but I would define this in a configuration or scripting language (optimally with some kind of “refresh button”), not in Rust. If needed for performance, it could later be “hardcoded” into Rust.
100% agree. That’s been one of the nice things about working with the team I’ve been working with for the last 5 or 6 years: when we’re faced with a platform decision that doesn’t have an obvious clear-cut answer we sit down and actually do some analysis of the different axes that we’re trading off and score the different options (intentionally blinding ourselves from the pre-defined weights so that we don’t accidentally bias ourselves too much). There’s always at least 7 or 8 different axes to compare along within the context of the specific problem we’re trying to solve.
No. That’s not a fact. That’s a blatant fallacy right there. That would only be true if all languages would be equal except for having memory safety or not. Which is obviously not true. You might argue that memory safety is a very important characteristic. That’s what’s up for debate, I won’t agree nor disagree. What’s a fact, is that memory safety is not the only characterisc of a language.
You’re twisting the argument. Nobody here (and probably none of the Rust devs you have met before) is saying that Rust’s safety makes it undeniably superior to unsafe languages. They’re just stating that safer is better, not that safety always trumps everything else.
Fully agree that the importance of various language characteristics varies, no language is universally better. Rust is a well-rounded language with many good characteristics, it’s not just for safety-critical stuff. If safety is sometimes brandished as the killer argument to justify the choice of Rust, it’s that the other characteristics compared well with other languages, or were not as important.
I’m not really a Rust fan, but isn’t that the value proposition? Having a language “like C/C++” but without (as much?) UB and memory unsafety? Then technically, it’s a “better” language. At least, when compared to C++ since it’s such a large language. Language size factors into “goodness” as well, at least for me. It would be nice to have a small C-sized language with the benefits of Rust (but ideally, also a less steep learning curve). I’m certain lots of people who are unwilling to switch would be jumping up and down to get something like that.
“Everything else being equal” is of course an idealistic/theoretical situation. But you need to approach it to make any kind of comparison. Voytec’s comment, by pitting “well-written C code” against “random Rust code”, is comically far from a fair comparison. You may disagree whether “Rust is more safe/correct/productive/etc than C” but at least it’s a claim that we can productively discuss.
To the extent that this may be true (citation needed), I would guess that this has more to do with what kind of projects C vs C++ are used for than with particular strengths and weaknesses of the languages.
I’m a Rust (amongst other languages) developer and don’t think that, so consider your streak broken ;) Less annecdotically, that’s really not the vibe I get from the community, which is generally enthusiastic but pragmatic. For example if you look at the numerous “why should I use Rust ?” questions on /r/rust, most commenters praise things like tooling before mentioning safety.
There’s probably subjectivity at play here. Maybe you’re confusing the argument “memory safety makes a language better” (IMHO a no-brainer) with “Rust is memory safe, and therefore the best language” (wrong for multiple reasons, and just trollish) ? Or you see somebody who is happy about their language choice for a project and wrongly assume that they think their language is perfect ?
I really don’t think that’s Rust’s main draw. Plenty of memory safe languages out there, most without escape hatches. I think it’s better to look at unsafety as a repulsor (whatever the person’s skill) than at safety as an attractor. But many people come to Rust from memory-safe languages, not just from C/C++.
There’s no single aspect of a language that makes it successful. Rust ticks a lot of boxes and has good cross-feature synergies, it’s compelling as a whole. But it’s not some kind of ultimate language, it’s just a tool to get some jobs done.
I can’t think of very many safe languages without escape hatches, actually. But you usually don’t reach for them as often in those languages as you might in Rust. On the other hand, as more and more libraries that use unsafe to provide something safe as written, it becomes easier to use Rust without ever touching unsafe, and those libraries may be treated as essentially equivalent to the unsafe inner workings of the implementations of higher level safe languages. I like to think of unsafe that way. If you want to add a new, safe to use primitive to Java, you write C++. If you want to add a new, safe to use primitive to Rust, you write Rust.
Not so fast. C is straight to the point language with a barebones set of constructs,.small syntax and limited feature set. This lends itself to be more simpler and more pragmatic code. Personally I think this is a much bigger deal than most people realize.
Sorry, I don’t understand how your argument is related to the quote. My point was that you can write good or bad code in both languages, so if you want to compare languages you need to do so with code of similar quality.
As for your argument in favor of C, it’s a common but very subjective one. C might be simple, but it really isn’t easy to use. To write decent C, you need to hold a lot of complexity in your head (risk of UB, unclear APIs…). The facilities that C doesn’t provide (collections, error handling, sum types…) make it hard to write simple C code. Indeed, this is a much bigger deal than most C devs realize.
Rust is a more complex language, but it’s easier to use, review, maintain, onboard than C.
If you narrowly define safety to mean memory safety, and if the rust project doesn’t contain any unsafe code, then the rust project is guaranteed safer. But those are big ifs.
Do you plan to support Linux systems where
io_uringis unavailable? Because of security problems, Google disables or severely restrictsio_uringon Android (and ChromeOS).What non-Linux operating systems do you plan to support?
On a spectrum of “temporary limitation” to “design principle”, what is this?
If it’s a design principle: What is intended to happen if another process does modify the database? What measures is Limbo intended to take to defend against another process modifying the database?
While SQLite doesn’t make any real promises beyond “we’ll try our best but if your filesystem sucks that’s on you”[1], that does seem to be a slightly stronger limitation than SQLite’s. In practice, though, is that a super common use case or are you more worried about accidentally opening it with two processes?
[1] https://www.sqlite.org/faq.html#q5
I don’t have any big-picture view of how common it is, but, for example, the Nix package manager (which Limbo itself appears to use) has a central SQLite database, and it’s reasonable to expect that two processes might try to access that database simultaneously.
This is correct. Storage durability is not the same as data integrity and there are several possible sources of read corruption, not all of which are inside the S3 protected boundary
tell me more!
If you have some links to share, I’d add them in the blog
Buggy network devices have been known to e.g. always clear bit 8 of byte 1320 in every packet they process AND recalculate the TCP checksum for the changed content.
A deep packet inspection device with bad RAM doing TLS interception will punch through those checksums too.
And of course your compute server could have bad RAM too, and by checking the hashes there you have a decent chance of catching all of these possibilities.
not GP, but logical data corruption can present itself in the presence of durable storage.
see, for example, the Jepsen analysis for YugaByteDB demonstrating read skew under clock skew. this failure mode doesn’t apply to the system you’re writing about, but the notion that logical data integrity is separate from physical data integrity is the thrust of what I interpreted of GP’s comment.
PostgreSQL Lock Conflicts is also a good resource on Postgres locks - https://postgres-locks.husseinnasser.com/
It is! I thought I had included a link to it but I must have missed it. Thank you for highlighting.
Now I want to know more, how did it go, how did you solve this problem? Keep us posted :)
I took an easy path: used Foundation DB as my storage server. Foundation DB is quite amazing, it does meet the requirements: strongly consistent, fault-tolerant, and horizontally scalable. However, it also has limits on transaction size (10mb) and timeout (5s), which are not configurable.
Hopefully, one day, we will write our own storage server :)
Thanks, and I remember reading about FoundationDB being based off/inspired from SQLite but can’t find any reference to that. I must have dreamed it! Meanwhile I did find this https://github.com/losfair/mvsqlite – SQLite on FoundationDB!
it wasn’t a dream and you are right! Foundation DB in fact uses (modified) SQLite internally - https://www.foundationdb.org/files/fdb-paper.pdf
This is also an implementation of Disaggregated Storage on top of SQLite!
I had eagerly waiting for this series to finish! I will read the posts again. Thank you!
could you please consider adding a RSS/Atom feed to your blog?
That’s a good idea. Could you suggest a blog that does a good job of this? I’ll copy whatever they’re doing :)
I see your blog is generated using Rust, check these two:
Ok, I’ve added https://jacko.io/rss.xml and linked to it from my homepage. Please let me know if it works with your reader.
it does, thank you!
I will be looking forward to the future posts
I found this post by @hwayne pretty interesting, so submitted here. I am not sure if LinkedIn lets you see posts without logging in, I did check in incognito and it let me. But I am still copy pasting the content:
It’s not like they HAVE to: Julia, BASIC, Lua, Matlab, and TLA+ all start arrays at 1 (“1-indexed”).
And most of the “mother languages” (ALGOL-60, FORTRAN II, SIMULA, CLU, etc) were neither. Instead they had “index ranges”, meaning you could pick whether they started at 0 or 1 or -13 or 94.
(This is probably because most early programming was scientific computing, and different fields of math and physics have different indexing conventions. I’m not surprised the programming languages tried to be flexible.)
So we have this thing where historically, most languages had index ranges, and over time we got to the present, where most languages are 0-indexed and a tiny minority are 1-indexed. What happened?
It’s hard to say. In 1982 Dijkstra argued that 0-indexing was innately superior (EWD831) based on user experience. But this comes fairly late in the shift, which was already happening in the 70’s.
I think a more likely explanation is “mechanical sympathy”. Early CS texts I found that focused on low-level or “hardware” implementations of things strongly preferred 0-indexing. One mother language, APL, let you choose between 0 or 1-indexing. The author used 1-indexing for “high level” math and 0-indexing for “microprogramming”, his words for low-level implementations.
“Mechanical sympathy” also explains BCPL. BCPL was a “barely more than machine code” language that implemented arrays as pointers to a block of memory.
arr[n]was syntactic sugar for*(arr+n). So the start of the address block was *arr, aka*(arr+0), so the first index of the array wasarr[0].BCPL would later inspire B, which would inspire C. C would profoundly influence all modern languages. So arguably BCPL did a lot to standardize 0-indexing, even if it wasn’t the sole force in its favor.
Nowadays most languages don’t need to be so close to the hardware, so there’s less incentive to make 0-indexing. At the same time, I regularly program in both 0- and 1-indexed languages, and find that a lot of array algorithms are a lot easier to implement in 0-indexing.
I’ve also worked with modern “index range” languages. They make local scripting a lot more pleasant, but they also make integrating libraries and other people’s code harder— you don’t know what convention they used. I suspect that’s why they fell out of popularity in the first place.
Overall, I’d say that the popularity of 0-indexing is historically contingent on the constraints of early hardware, but probably was the best outcome in the long run.
Just don’t get me started on the natural numbers.
So Dijkstra was right!
How early into C’s life did it even have array operators? I vaguely remember reading it only had pointer arithmetic for a while and 0 indexing was because it was seen as sugar on top of pointer arithmetic with the index representing how many increments to the pointer.
I could be wildly wrong as this is just a vague memory of reading something years ago.
B’s pointer and indexing operators are basically the same as C.
BCPL was weirder than ~av described. Its indirection operator is ! and there are unary and binary variants. So
*ptr; array[42]; record->field;!ptr; array!42; record!field;In BCPL you declare fields of structures as manifest constants 0, 1, 2.
previous post: Why German Strings are Everywhere.
I also learned both through experience and advice from someone that worked with RocksDB at Meta that column families are generally to be avoided. The same kind of namespacing can be instead accomplished using prefixes without the performance hit.
As for the comparator issues, I think using something like Flatbuffers would help but that would be a larger change as it involves changing out those structs.
did they elaborate why
There’s an open RocksDB bug about that: https://github.com/facebook/rocksdb/issues/5117
Something I hadn’t considered before reading this is that the SQLite global write lock applies on a per-database basis… so you can increase your concurrent writes by spreading them across multiple database files, at which point you’re limited by CPU cores and the number of files you can open at once.
SQLite lets you attach up to ten database files to the same connection and supports cross-database joins, so you could even spread writes across ten copies of the same table (each in a separate DB) and then run queries against the UNION of them all.
Could be fun to benchmark some of these patterns and see how the writes-per-second could go.
But do transactions spread across databases?
I haven’t read it all yet, but it looks like it does !
https://www.sqlite.org/atomiccommit.html#_multi_file_commit
Yes, they do, but I believe in WAL mode, there is a chance that on failure, you might end up in a partially committed state, where your changes could have been applied to some of the databases but not all of them.
Never ending up with partially committed state is what defines a transaction, so your answer should be “cross database transactions are not supported in wal mode.”
You are correct, my wording was very clumsy there!
This is what libSQL tries to do, it has both: multi tenancy and support for attach statements over them. So this should be very easy to benchmark this
disclosure: I am one of the maintainers of libSQL
I do want to play around with attaching databases as well, but if you are running into write performance issues in your web app, you are probably better off switching to Postgres than messing around with multiple database files.
That said, I have written Go code that rolled over from one database file to the next for a logging system, and I had a lot of fun doing it :D
I wonder how expensive a PostgreSQL setup for that kind of write performance might be? Handling 10,000+ writes per second might get pretty pricey with a server-based DB, could be SQLite with weird tricks ends up a lot cheaper to run.
Why would the same server be priced differently?
In my experience people using PostgreSQL tend to go with managed solutions like RDS. I don’t know how it would work out if you compared the cost of 10,000+ writes/second on PostgreSQL on a VPS with the same performance from SQLite on a similar VPS.
A lot of the managed PostgreSQL services are really aimed at people with a lot of data. The Azure ones, for example, reserve an entire VM for the RDBMS and charges you for the storage and compute that it provides. It looks as if RDS is similar.
In any situation where SQLite would be considered, I think that’s almost certainly overkill. I moved from SQLite to PostgreSQL on a small VM for NextCloud, for example, and everything became noticeably more responsive but the
postgresprocess is using <1% of CPU when multiplephp-fpmprocesses are all using 10-25%. Even the cheapest managed offering would be an order of magnitude more CPU than this needs.I did some benchmarking to see how SQLite performs compared to Postgres and MySQL all of them running on the same VPS. I didn’t play around with multiple databases in SQLite but the benchmark showed that all the databases have similar throughput. I’ll write a separate blog post about it, but the gist was that for read requests SQLite was the fastest for about 5-10%, and for write only SQLite was the slowest by about 30%. This was on a 4 VCPU hetzner machine (20eu/month).
You probably also want the busy timeout and immediate mode so writes can wait for a lock vs just returning error.
Yeah, good point! I’ve added a note about this to the blog post, thanks!
how does
BUSY_TIMEOUThelp if you are doingBEGIN IMMEDIATE?BEGIN IMMEDIATE will lock your database for longer than a regular BEGIN, so having a longer BUSY_TIMEOUT to offset this can help reduce database is locked errors. Also, without BEGIN IMMEDIATE you can get database is locked errors immediately, not respecting BUSY_TIMEOUT at all, which can be very confusing!
this is confusing lol.
my understanding was: if you use
BUSY_TIMEOUTalone, then the write can wait for timeout time or else return with database locked errorif you use
BEGIN IMMEDIATE, if a write is already progressing then you will immediately get database locked error renderingBUSY_TIMEOUTuseless.https://www.sqlite.org/lang_transaction.html
This is super confusing, and the naming don’t help either!
With BUSY_TIMEOUT alone you get the SQLITE_BUSY error immediately if the read lock cannot be upgraded into a write lock. In this case, waiting for BUSY_TIMEOUT doesn’t make sense, since the other thread with the write lock could change the data and invalidate the read that we already made, thus violating transaction serializability.
Example:
BEGIN IMMEDIATE acquires the write lock immediately so you are never in the state where you need to upgrade a read lock to a write lock.
Example:
So the point is that BEGIN IMMEDIATE protects you from SQLITE_BUSY without waiting until BUSY_TIMEOUT.
won’t it help with throughput if write is finished before the timeout?
because, the second thread also proceed with the write immediately
Yes, you will have higher throughput without BEGIN IMMEDIATE - the write locks will be shorter and if the read lock can’t be upgraded into a write lock SQLite will just give up immediately, so also less threads waiting to acquire a lock.
But the problem is that you now have to handle these errors in your application code, because now the first write after a read in a transaction can randomly fail - even if you only have one other write going on at the time. This is quite a common scenario for web applications, so you’ll end up seeing a bunch of SQLITE_BUSY errors due to this, even if your site isn’t getting a lot of traffic. BEGIN IMMEDIATE solves this, since your transactions will always wait for BUSY_TIMEOUT before failing. With BEGIN IMMEDIATE you’ll only see SQLITE_BUSY errors if your site is under actual load that prevented SQLite from acquiring a lock in the specified timeout.
I think of them not as random failures but as part of MVCC. In a non-immediate transaction you’re doing a sort of optimistic read-modify-write loop, where you have to retry if another txn snuck in and wrote before you did.
That said, I do almost always use immediate txns because they’re easier and, in a single-user app, write concurrency is less important.
I agree with you 100%. The issue is that if you use a framework like Django and don’t know much about SQLite internals, you don’t expect the code in your transaction to fail, at least not without first exceeding the busy wait timeout. None of the other databases work like this and SQLite’s behavior will feel random and confusing to devs or at least that’s how it felt to me before I figured out how IMMEDIATE transactions work.
The tldr is: BUSY_TIMEOUT only really works when BEGIN IMMEDIATE is on. It looks like others have filled in the details, and here is a blog post that explains it too: https://kerkour.com/sqlite-for-servers#use-immediate-transactions