I disagree with this, only because it’s imperialism. I’m British, in British English I write marshalling (with two of the letter l), sanitising (-sing instead of -zing except for words ending in a z), and -ise instead of -ize, among other things. You wouldn’t demand an Arabic developer to write all his comments in English for your sake for the sake of being idiomatic, would you?
I’ve worked for a few companies in Germany now, about half of them with their operating language being in German. All of them had comments being written exclusively in English. I don’t know how that is in other countries, but I get the impression from Europeans that this is pretty standard.
That said, my own preference is for American English for code (i.e. variable names, class names, etc), but British English for comments, commit messages, pull requests, etc. That’s because the names are part of the shared codebase and therefore standardised, but the comments and commit messages are specifically from me. As long as everyone can understand my British English, then I don’t think there’s much of a problem.
EDIT: That said, most of these suggestions feel more on the pedantic end of the spectrum as far as advice goes, and I would take some of this with a pinch of salt. In particular, when style suggestions like “I tend to write xyz” become “do this”, then I start to raise eyebrows at the usefulness of a particular style guide.
All of them had comments being written exclusively in English. I don’t know how that is in other countries, but I get the impression from Europeans that this is pretty standard.
Developers in China seem to prefer Chinese to English. When ECharts was first open-sourced by Baidu most of the inline comments (and the entire README) were in Chinese:
In Japan I feel like the tech industry is associated with English, and corporate codebases seem to use mostly English in documentation. However, many people’s personal projects have all the comments/docs in Japanese.
If someone wants to force everyone to spell something the same within a language they should make sure it’s spelled wrong in all varieties, like with HTTP’s ‘referer’.
The Go core developers feel so strongly about their speling that they’re wiling to change the names of constants from other APIs.
The gRPC protocol contains a status code enum (https://grpc.io/docs/guides/status-codes/), one of which is CANCELLED. Every gRPC library uses that spelling except for go-grpc, which spells it Canceled.
Idiosyncratic positions and an absolute refusal to concede to common practice is part and parcel of working with certain kinds of people.
We’re drifting off-topic, but I have to ask: gRPC is a Google product; Go is a Google product; and Google is a US company. How did gRPC end up with CANCELLED in the first place?!
You wouldn’t demand an Arabic developer to write all his comments in English for your sake for the sake of being idiomatic, would you?
If this is something other than a private pet project of a person who has no ambition of ever working with people outside of his country? Yes, yes I would.
I believe the advice is still applicable to non-native speakers. In all companies I worked for in France, developers write code in English, including comments, sometimes even internal docs. There are a lot of inconsistencies (typically mixing US English and GB English, sometimes in the same sentence.)
In my experience (LatAm) the problem with that is people tend to have pretty poor English writing skills. You end up with badly written comments and commit messages, full of grammatical errors. People were aware of this so they avoided writing long texts in order to limit their mistakes, so we had one-line PR descriptions, very sparse commenting, no docs to speak of, etc.
Once I had the policy changed for the native language (Portuguese) in PRs and docs they were more comfortable with it and documentation quality improved.
In Europe people are much more likely to have a strong English proficiency even as a second or third language. You have to know your audience, basically.
While I like to write paragraphs of explanation in-between code, my actual comments are rather ungrammatical, with a bit of git style verb-first, removing all articles and other things. Proper English feels wrong in these contexts. Some examples from my currently opened file:
; Hide map’s slider when page opens first time
;; Giv textbox data now
;;Norm longitude within -180-180
; No add marker when click controls
;; Try redundant desperate ideas to not bleed new markers through button
;; Scroll across date line #ToDo Make no tear in marker view (scroll West from Hawaii)
Those comments would most likely look weird to a person unfamiliar with your particular dialect.
In a small comment it’s fine to cut some corners, similar to titles in newspapers, but we can’t go overboard: the point of these things is to communicate, we don’t want to make it even more difficult for whoever is reading them. Proper grammar helps.
For clarification, this is not my dialect/way of speaking. But I see so many short interline comments like this, that I started thinking they feel more appropriate and make them too, now. Strange!
Is “hat” a standard term regularly used in the golang ecosystem for a specific thing and on the list given in the article? If not, it is not relevant to the point in the article.
(And even generalized: if it happens to be an important term for your code base or ecosystem, it probably makes sense to standardize on how to spell it. in whatever language and spelling you prefer. I’ve worked on mixed-language codebases, and it’d been helpful if people consistently used the German domain-specific terms instead of mixing them with various translation attempts. Especially if some participants don’t speak the language (well) and have to treat terms as partially opaque)
I had to solve this once. I maintain a library that converts between HTML/CSS color formats, and one of the formats is a name (and optional spec to say which set of names to draw from). HTML4, CSS2, and CSS2 only had “gray”, but CSS3 added “grey” as another spelling for the same color value, and also added a bunch of other new color names which each have a “gray” and a “grey” variant.
Which raises the question: if I give the library a hex code for one of these and ask it to convert to name, which name should it convert to?
The solution I went with was to always return the “gray” variant since that was the “original” spelling in earlier HTML and CSS specs:
I don’t think it’s really “imperialism”—firstly, “marshaling” isn’t even the preferred spelling in the US. Secondly in countries all over the world job listings stipulate English language skills all the time (even Arabic candidates) and the practice is widely accepted because facilitating communication is generally considered to be important. Lastly, while empires certainly have pushed language standardization as a means to stamp out identities, I don’t think it follows that all language standards exist to stamp out identities (particularly when they are optional, as in the case of this post).
“marshaling” isn’t even the preferred spelling in the US
What makes you say that? (Cards on the table, my immediate thought was “Yes, it is.” I had no data for that, but the ngram below suggests that the single l spelling is the (currently) preferred US spelling.)
I wasn’t going to comment, but since others have — initially I wasn’t sure if this was a joke, with “features” like “No package manager to distract you” and “Simple type system … everything is an int”.
I mean, I understand it’s in early development, but the website seems to be overpromising and underdelivering. If it were my language I’d have waited until it was farther along before showing it off.
I do think some “features” on here are rather ridiculous e.g. all types are ints, but I do consider the lack of a package manager to be a feature. IMO package management should be delegated to my system package manager or something like Guix/Nix
I think Nix is about as portable between Linux distros as, say, Cargo. AFAIK Nix doesn’t work on Windows, but a new monolingual package manager very well might not either.
Markdown files describing what you need a relatively portable, just that it’s between humans rather than package managers. Some dislike this for good reason, others dont
Parts of the readme are certainly a joke, parts are descriptions of where the implementation is now, parts are things I’d like to do or have done at some point. I certainly wasn’t intending to show it off.
A great way to make smaller and maybe-more-recognizable (or at least more-easily-localized) icons less opaque is tooltips. …Oh right, you can’t have those on touch devices. Welp.
I actually wrote a draft for a blog a couple days ago on tooltips but it isn’t ready to go live yet… the short version is tooltips kinda suck:
you can’t scan them quickly, you have to position the mouse and pause for each one you want to look it.
they sometimes pop up when you don’t want them and cover things up (this drives me totally nuts)
it isn’t always obvious what does and doesn’t have tooltips, so you guess and check, slowly.
Among a few other things. I’d rather have both visible. One alternative for the toolbar is that every tool bar icon should also be visible in the main menu…. the icon and label could be together in the menu, allowing you to scan it and learn the icons, then the shortcut on the toolbar is more usable as you get to know it.
Using standard icons for standard operations is nice too. Old style open, close, save, print, standard icons and behaviors are getting depressingly less common, sigh.
I’ll make an attempt, with the caveat that this list seems so obvious to me that I’m worried I might be missing some nuance (imagine a similar list about cooking utensils with “people think knives can only be used for butter, but in reality they can also be used to cut bread, meat, and even vegetables!!!”).
Sentences in all languages can be templated as easily as in English: {user} is in {location} etc.
Both the substitutions and the surrounding text can depend on each other. The obvious example is languages where nouns have gender, but you might also have cases like Japanese where “in” might be へ, で, or に to indicate relative precision of the location.
Words that are short in English are short in other languages too.
German is the classic example of using lengthy compound words where English would use a shorter single-purpose word, “Rindfleisch” vs “beef” or “Lebensmittel” vs “food” (why yes I haven’t had lunch yet, why do you ask…?).
For any text in any language, its translation into any other language is approximately as long as the original.
See above – English -> German tends to become longer, English -> Chinese tends to become shorter.
For every lower-case character, there is exactly one (language-independent) upper-case character, and vice versa.
Turkish and German are famous counter-examples, with Turkish 'i' / 'I' being different letters, or German ß capitalizing to "SS" (though I think this is now considered somewhat old-fashioned?).
The lower-case/upper-case distinction exists in all languages.
Not true in Chinese, Japanese, Korean.
All languages have words for exactly the same things as English.
Every language has words that don’t exist in any other language. Sometimes because the concept is alien (English has no native word for 寿司), sometimes because a general concept has been subdivided in a different way (English has many words for overcast misty weather that don’t translate easily into languages from drier climates).
Every expression in English, however vague and out-of-context, always has exactly one translation in every other language.
I’m not sure what this means because many expressions in English don’t even have a single explanation in English, but in any case, idioms and double entendres often can’t be translated directly.
All languages follow the subject-verb-object word order.
If one’s English to SVO order is limited, limited too must their knowledge of literature be.
When words are to be converted into Title Case, it is always the first character of the word that needs to be capitalized, in all languages.
Even English doesn’t follow a rule of capitalizing the first character of every word. Title Casing The First Letter Of Every Word Is Bad Style.
Every language has words for yes and no.
One well-known counter-example being languages where agreement is by repeating a verb:
A: “Do you want to eat lunch together?”
B: “Eat.”
In each language, the words for yes and no never change, regardless of which question they are answering.
See above.
There is always only one correct way to spell anything.
Color / colour, aluminum / aluminium
Each language is written in exactly one alphabet.
Not sure exactly what this means – upper-case vs lower-case? Latin vs Cyrillic? 漢字 vs ひらがな カタカナ ? 简化字 vs 繁体字 ? Lots of counter-examples to choose from, Kazakh probably being a good one.
All languages (that use the Latin alphabet) have the same alphabetical sorting order.
Some languages special-case ordering of letter combinations, such as ij in Dutch.
And then there’s the dozens of European languages that have their own letters outside the standard 26. Or diacritics.
All languages are written from left to right.
Arabic, Hebrew.
Even in languages written from right to left, the user interface still “flows” from left to right.
Not sure what “flows” means here, but applications with good RtL support usually flip the entire UI – for example a navigational menu that’s on the right in English would be on the left in Arabic.
Every language puts spaces between words.
Segmenting a sentence into words is as easy as splitting on whitespace (and maybe punctuation).
Chinese, Japanese.
Segmenting a text into sentences is as easy as splitting on end-of-sentence punctuation.
English: "Dear Mr. Smith".
No language puts spaces before question marks and exclamation marks at the end of a sentence.
No language puts spaces after opening quotes and before closing quotes.
French famously has rules that differ from English regarding spacing around punctuation.
All languages use the same characters for opening quotes and closing quotes.
“ ” in English,「 」in Japanese, « » in French,
Numbers, when written out in digits, are formatted and punctuated the same way in all languages.
European languages that use '.' for thousands separator and ',' for the fractional separator, or languages that group by different sizes (like lakh/crore in Indian languages).
No two languages are so similar that it would ever be difficult to tell them apart.
Many languages are considered distinct for political reasons, even if a purely linguistic analysis would consider them the same language.
Languages that have similar names are similar.
English (as spoken in Pittsburgh), English (as spoken in Melbourne), and English (as spoken in Glasgow).
More seriously, Japanese and Javanese.
Icons that are based on English puns and wordplay are easily understood by speakers of other languages.
Often they’re difficult to understand even for English speakers (I once saw a literal hamburger used to signify a collapsable sidebar).
Geolocation is an accurate way to predict the user’s language.
Nobody who has ever travelled would think this. And yet. AND YET!
C’mon Google, I know that my IP is an airport in Warsaw but I really don’t want the Maps UI to switch to Polish when I’m trying to find a route to my hotel.
Country flags are accurate and appropriate symbols for languages.
You can roughly gauge where you are in the world by whether the local ATMs offer “🇬🇧 English”, “🇺🇸 English”, or “🇦🇺 English”.
Every country has exactly one “national” language.
Belgium, Luxembourg, Switzerland.
Every language is the “national” language of exactly one country.
Turkish and German are famous counter-examples, with Turkish ‘i’ / ‘I’ being different letters, or German ß capitalizing to “SS” (though I think this is now considered somewhat old-fashioned?).
The German ß has history.
The old rule is that ß simply has no uppercase. Capitalizing it as “SS” was the default fallback rule if you had to absolutely capitalize everything and the ß would look bad (such as writing “STRAßE” => “STRASSE”). Using “SZ” was also allowed in some cases.
The new rule is to use the uppercase ß: ẞ. So instead of “STRASSE” you now write “STRAẞE”.
The usage of “SZ” was disallowed in 2006, the East Germans had an uppercase ß since 1957, the West German rules basically said “Uppercase ß is in development” and that was doppred in 1984 for the rule to use SS or SZ as uppercase variant. The new uppercase ß is in the rules since 2017. And since 2024 the uppercase ß is now preferred over SS.
The ISO DIN 5008 was updated in 2020,
This means depending on what document you’re processing, based on when it was created and WHERE it was created, it’s writing of the uppercase ß may be radically different.
It should also be noted that if you’re in Switzerland, ß is not used at all, here the SS substitute is used even in lower case.
Family names may also have custom capitalization rules, where ß can be replaced by SS, SZ, ẞ
or even HS, so “Großman” can become “GROHSMANN”. Note that this depends on the person, while Brother Großmann may write “GROHSMANN”, Sister Großmann may write “GROSSMANN” and their mother may use “GROẞMANN” and these are all valid and equivalent.
Umlauts may also be uppercased without the diacritic umlaut and with an E suffix; ä becomes “AE”. In some cases even lowercase input does the translation because older systems can’t handle special characters, though this is not GDPR compliant.
No two languages are so similar that it would ever be difficult to tell them apart.
Many languages are considered distinct for political reasons, even if a purely linguistic analysis would consider them the same language.
If you ever want to have fun, the politics and regionality of German dialects could be enough to drive some linguists up the wall.
Bavarian is recognized as a language and dialect at the same time, it can be subdivided into dozens and dozens of subdialects, which are all similar but may struggle to understand eachother.
As someone who grew up in Swabian Bavaria, my dialect is a mix of both Swabian and Bavarian, I struggle to understand Northern Bavaria but I struggle much less with Basel Swiss Germany (which is distinct from Swiss German in that it originates from Lower Allemans instead of Higher Allemans) which is quite close in a lot of ways.
And the swiss then double down on making things confusing by sometimes using french language constructs in german words, or straight up importing french or italian words.
East Germany added the uppercase ß in 1957 and removed it in 1984. The spelling rules weren’t updated, so despite the presence of an uppercase ß, it would have been wrong to use it in any circumstances. Since Unicode 1.0 is somewhere around 1992, with some early drafts in 1988, it basically missed the uppercase ß being in the dictionary.
The uppercase ß itself has been around since 1905 and we’ve tried to get it into Unicode since roughly 2004.
Every expression in English, however vague and out-of-context, always has exactly one translation in every other language.
I’m not sure what this means because many expressions in English don’t even have a single explanation in English, but in any case, idioms and double entendres often can’t be translated directly.
A good example of this is a CMS I used to work on. The way it implemented translation was to define everything using English[0], then write translations as a mapping from those English snippets to the intended language. This is fundamentally flawed, e.g. by homonyms:
Subject From Flags Actions
----------------------------------------------------------------
Project update Alice Unread, Important [Read] [Delete]
Welcome HR Read [Read] [Delete]
Here, the “Read” flag means “this has been read”, whilst the “Read” button means “I want to read this”. Using the English as a key forces the same translation on both.
[0] We used British English, except for the word “color”; since we felt it was better to match the CSS keywords (e.g. when defining themes, etc.).
One trick is to use a different word on the asset: Reviewed(adj) and Review(v) don’t have the same problem that Read(adj) and Read(v) do. Seen(adj) and See(v); Viewed(adj) and View(v). And so on. Then you can “translate” to English to actually use Unread/Read/[Read] if you still like it without confusing the translator who need to know you want more like e.g. Lido/Ler or 阅读/显示 and so on.
The number of exceptions caused by the Hebrew calendar makes me shed a tear of joy.
Here’s one falsehood they missed: the length of a year varies by at most one day. True in Gregorian calendar, apparently true in the Islamic calendar, but not true in the Hebrew calendar: leap years are 30 days longer than regular years.
They sorta cover it on the “days” section, by way of mentioning that the Hebrew calendar has leap months.
They also miss Byzantine calendars which are still used by many churches, related to the Jewish Greek calendar from the Septuagint. It’s of course complicated by the fact that many churches & groups do not agree on what year was the start, so it’s complex to use (but still in somewhat fairly common liturgical use).
Here’s a fun (counter)example of (something like) this one from my heritage language:
In each language, the words for yes and no never change, regardless of which question they are answering.
(Context: the word for enjoy/like is the same in the language, so when translating to English, I choose whichever sounds most natural in each given example sentence.)
When someone says, “do you (enjoy/)like it?”, if you want to say “yes, I like it”, that’s fine, but if you want to say you don’t like it, you would say, “I don’t want it”; if you were to say, “I don’t like it” in that situation, it would mean, “I don’t want it”. The same reversal happens if they ask, “do you want it?”, and you want to respond in the negative.
So someone would say, “do you want a chocolate bar?”, and you’d say, “no, I don’t want it”, and that would mean, “no, (I already know) I don’t (usually/habitually) enjoy it (when I have it), (therefore I don’t want it)”, whereas, “no, I don’t enjoy it” would just straightforwardly mean, “I don’t want it”.
(You can also respond with just, “no!” instead of using a verb in the answer.)
This only happens in the habitual present form. Someone might ask, “do you like chocolate?” before they offer you some, and you can say, “no, I don’t want it”, but if they already gave you a chocolate bar to try, they may ask, “did you like it?” in the past tense, and you’d have to respond with, “I didn’t like it” instead of, “I didn’t want it”. And, “do you want chocolate?” would be met with, “no, I don’t like it”, but “did you want chocolate?” would be met with, “no, I didn’t want it”, and that second one would just mean what it straightforwardly sounds like in English.
(Strictly speaking, it doesn’t need to be a response to a question, I’m just putting it into a context to show that the verb used in the answer isn’t just a negative form of the same verb used in the question.)
(It’s hard to explain because if you were to translate this literalistically to English, it wouldn’t even be noticed, since saying, “no, I don’t like it” in response to, “do you want it?” is quite natural, but does literally just mean, “I don’t like it”, in the sense of, “no, (I already know) I don’t (usually/habitually) enjoy it (when I have it), (therefore I don’t want it)”. Even, “no, I don’t want it“ in response to, “do you like it?” is fairly natural in English, if a little presumptive-sounding.)
In Polish when someone asks you “Czy chcesz cukru do kawy?” (“Do you want coffee with sugar?”) and you can respond with “Dziękuję”, which can mean 2 opposite things “Yes, please” or “No, thank you”.
The first one gets a pass because it was the first one, and even then, I think it’s better to link people one of the many explainers people wrote about it.
The other thing this doesn’t mention is that if you have LSP set to auto-enable when you visit a file, you cannot even look at the file in your text editor for an untrusted project, which presents some obvious difficulties given that the means by which you decide if a project is trusted typically involves opening it in your text editor.
I think this is better than nothing, but if it’s a prompt that happens every time you open a new repo, it’s very likely that users will get trained to click OK without reading the message, to the point where the prompt doesn’t really accomplish much more than covering the ass of the editor maintainers.
That’s why I have Vim for code editing and ripgrep+less for inspecting untrusted code. Clean separation between the programmable text editor with plugins and the stuff for inspecting untrusted stuff.
Is there a way to make the native dialog boxes (typically an OK / Cancel sort of thing) keyboardable? That’s always what drove me nuts about MacOS. Just give me tab select and enter to trigger! I don’t understand why they’ve never adopted that.
There is an accessibility toggle somewhere that makes them (mostly) keyboard accessible. I can’t remember where it is exactly, but knowing that it exists is half the battle! ;)
I was excited to try this out, but discovered that it prevents the space bar from working in kitty. I’m sure there is a way to fix that, but I wanted to leave a note here in case anyone else might try this out and then a hour later wonder why their space bar isn’t working in their terminal.
You can click the little ( i ) icon in the top right and change some of the bindings. I switched “activate” from space bar to what Apple’s always called “Return” (although I thought The Mac is not a Typewriter).
But I ended up turning it all off anyway, because the UX was so terrible and got in the way of everything. I only wanted keys for standard buttons on standard dialog boxes! I wish you better luck than I.
I still don’t understand why pledge(2)/unveil(2) have not been ported to other kernels. They have proven to work very well for a almost a decade now, as they were introduced in 2015.
We’re doing a capstone project with one of my students where we’re porting pledge to FreeBSD. We’re not doing unveil because I don’t know the internals as good as pledge, but hopefully something good comes out of this experiment.
You might want to reach out to some of the HardenedBSD community members. We have a port of pledge that is nearing completion and will likely be merged into HardenedBSD on the inside of a year, quicker if we have help. :-)
The Linux approach, for better or worse, is to provide more generic/complex kernel APIs, the two most similar to pledge/unveil being respectively seccomp-bpf and landlock. Given those, you can port pledge/unveil in userspace. But that obviously results in less uptake than an API blessed/mandated by the OS developers in the OpenBSD style.
For example, let’s say you want to do something on Linux like control whether or not some program you downloaded from the web is allowed to have telemetry.
This seems pretty easy on Linux with systemd, am I missing something? The program itself doesn’t even have to know about any control mechanisms like pledge or whatever, we can enforce it from outside.
pledge and unveil are specifically designed as tools for the programmer. They aren’t meant to be imposed from the outside as a sandbox. Theo has repeatedly rejected the idea of adding a utility to OpenBSD that runs an arbitrary command line under a particular pledge, for example. The intended use is for the developer (or knowledgeable porter) to pledge/unveil down to the resources that they know the program needs. It’s a tool for defensive programming: if your pledged process gets owned, it has as little ambient authority as possible.
I agree. The programmer knows better than the user what resources a program needs.
Besides, the external approach imposes a static set that stays the same during the entire runtime, which is sort of critical, mostly because many programs require more privileges during the initialization which can then be dropped later.
The programmer knows better than the user what resources a program needs.
I agree, but OTOH the user has the much greater incentive to care about sandboxing programs than their programmers (when the user and the programmer are not the same).
OK but my point is that the linked post says that this functionality is extremely difficult in Linux, but if we shift our thinking slightly to work with Linux’s execution resource controls, it seems to be quite easy.
It…depends. Linux sandboxing solutions are not particularly unified and depend on stitching a lot of pieces together, sometimes with confusing or bizarre edge cases. It’s not awful, but it’s still a fair bit of effort and a surprising amount of room for error.
(I will say that Landlock helps a lot here by being a relatively high-level set of access controls for things that are otherwise more difficult to sandbox.)
No, it really looks quite simple. Check the link I posted earlier. It’s as simple as adding a couple of lines to the systemd unit file or a couple of flags to the systemd-run command line.
There’s a significant structural difference between the two. I honestly don’t know why the Internet keeps lumping them together. Yes, both apply restrictions, but they’re completely different mechanisms. It’s like lumping email and keyloggers together because both ultimately boil down to delivering a series of keypresses to a third-party.
pledge and unveil are things that the program does, which has two major implications:
It’s generally expected that the program is written in such a manner that pledgeing a particular restriction is a supported mode of operation.
Breaching a pledge kills the program with SIGABRT and you get a stack trace pointing to wherever the promise was broken.
You can pledge or unveil whenever you want during a program’s execution (but there are restrictions to what you can “unpledge” or “un-unveil”). So you can e.g. start with full access rights to the user’s home path in order to read configuration, and you can drop filesystem access privileges entirely once you’ve read configuration data.
“External” sandboxing solutions allow you to apply the same restrictions, but:
They’re going to apply regardless of whether the program supports them or not. E.g. you can IPAddressDeny= an address, but if sending telemetry to that address is a blocking operation, you’ll just get a blocked program. That’s obviously not on the sandboxing method itself, but…
…breaching a restriction doesn’t necessarily give you any useful debugging context. The failure mechanism for breaching a restriction is usually operation-dependent. E.g. IPAddressDeny= packets just get dropped.
Those restrictions apply throughout a program’s execution lifecycle. So if you need higher privileges at start-up you’re out of luck, since external tools have no insight into what a program is doing.
The whole point of pledge and unveil is that you write your program so that you make certain promises about how it’s going to work, and sandboxing works as protection against accidentally breaking those promises, either straight as a bug, or through a bug getting exploited. They’re design-by-contract tools, more or less.
I’m a bit confused, I wouldn’t expect “Linux’s execution resource controls” to include systemd-specific APIs? Tbc these toggles do not cleanly map to OS primitives at all; you can’t just go “ah yes I want systemd’s IP address restrictions myself” without writing your own BPF filtering.
If you did just mean systemd specifically, note that it’s quite a bit trickier to, say, sandbox things in a flexible way with that. In particular, to sandbox something that’s not a service with systemd-run has the overhead of spawning the child under the service manager and piping the output, and you lose the ability to easily control and kill the child (because you’re just killing the proxying process, not the actual child).
I didn’t say it’s a ‘flat out replacement’, I said it achieves the same goals that were mentioned in the linked post ie preventing a piece of software I downloaded off the internet from ‘phoning home’ and uploading telemetry.
Okay I understand now.. The way I see it, the example in that post was a one-off example, and a pretty bad one, because I think from this you’ve misunderstood the purpose of pledge. While there may be some overlap in the functionaility of BPF filters they are far, far, from 1-1.
To help clear this up:
pledge is not for sandboxing existing/binary (i.e. already complied) software, nor limiting network connections in software (the former would preferably be done with network controls like firewalls).
pledge must be added to the source code of the software itself (either by a dev or maintainer).
on a violation of “promises” the program is killed
you can limit network connection but it’s simply “all or nothing”; your software gets all network, or no network.
So in this scenerio of telemetry:
if devs/maintainers have source access to add pledge then they also have source access to just rip out that telemetry code right?
There would be no point in using pledge here.
in using pledge, as soon as the software tries to do any networking, the program is killed! The connection is blocked – only because your software is now not running at all!
Yeah exactly, the overlap is what I’m talking about. This is what I was referring to earlier–if you shift your mindset a bit, it becomes pretty easy to externally impose similar security restrictions using systemd. If you don’t mind that the exact way it’s done is a bit different, you can achieve very similar goals.
Part of the pledge/unveil model is that you can progressively drop privileges during program execution. The observation is that programs may need a lot of privileges in order to start themselves up, and then hardly any for the remainder of runtime. For example, consider the recommendations for using pledge when opening handles to sndiod in OpenBSD:
If the sndio library is used in combination with pledge(2), then the sio_open() function needs the stdio, rpath, wpath, cpath, inet, unix, dns, and audio pledge(2) promises.
However:
Once no further calls to sio_open() will be made, all these pledge(2) promises may be dropped, except for the audio promise.
And in fact aucat(1) does exactly this, using a bunch of privileges at startup time (but dropping other very sensitive promises like proc and exec right away) and then dropping most of them when it’s time to start playback or recording from the sound device handle.
The purpose of pledge is for the programmer to mitigate the privileges available to exploits, not for the user to defend against code they chose to run. As far as I know, these goals, which are the main point of pledge and unveil, do not overlap at all with what you can do by filtering access via systemd.
They overlap with the goal that was stated in the linked post. That was what I was referring to. I agree that dropping complex combinations of privileges in Linux at different points within an application’s lifecycle doesn’t seem easy.
Yeah but the linked post misunderstands what pledge is actually for; “sandboxing untrusted binaries” is not an actual design goal of pledge and if it’s useful for that goal it’s only useful by accident.
Yeah so this is kinda weird: it makes it easy to manage the child as a single entity and then interactively control it. But systemd-run is, ofc, just a proxy, spawning the child as a transient unit and monitoring it until exit.
This effectively means that systemd-run does not itself control the spawned process’s lifetime. When you run it with –pty, Ctrl-C will kill the child normally… because it’s going through the tty. But if you’re running it detached from the tty (e.g. because you need to control stdout/stdin from your program), or you just signal the process directly, the actual service will remain running in the background. The only way of directly controlling that lifecycle is by grabbing the PID or (preferably) transient unit name and then targeting it directly or controlling it via the D-Bus interface.
But it’s a transient unit, it shows up in the output of systemctl list-units, right? That means you can operate on it with eg systemctl stop the-unit just like other systemd units?
It is, yeah, which is why I had mentioned the “it makes it easy to […] interactively control it”. It’s just a lot harder in turn to automatically do things like “oh the user did Ctrl-C, let me stop all my processes”.
From my admittedly passing acquaintance with both of them: Capsicum is much finer grained.
Pledge/unveil solves the 80% use case, Capsicum aims to solve the 100% use-case.
I.e. it would be very difficult, perhaps impossible to implement capsicum in pledge/unveil, but easy enough to do with capsicum.
That said, there is probably more veil/pledge actually implemented in binaries out in the wild. I think all of the base OpenBSD system is done now. Last I checked only some of FreeBSD has Capsicum implemented in base.
chicken-egg problem now. Who would want to use a QR code encoding that won’t work with the majority of QR code readers for only a very small gain, and how many reader implementations are actively maintained and will add support for something nobody uses yet?
The “byte” encoding works well for that. Don’t forget, URls can contain the full range of Unicode, so a restricted set is never going to please everyone. Given that QR codes can contain ~4k symbols, I’m not sure there’s much need for an entirely new encoding.
Yes, although… there’s some benefit to making QR codes smaller even when nowhere near the limits. Smaller ones scan faster and more reliably. I find it difficult to get a 1kB QR code to scan at all with a phone.
QR codes also let you configure the amount of error correction. For small amounts of data, you often turn up the error correction which makes them possible to scan with a very poor image, so they can often scan while the camera is still trying to focus.
URIs can contain the full Unicode range, but the average URL does not. Given that’s become a major use case for qrcodes it’s definitely a shame it does not have a better mode for them: binary mode needs a byte per byte while alnum only needs 5.5 bits per byte.
All non-ascii characters in URLs can be %-encoded so unicode isn’t a problem. A 6-bit code has room for lower case, digits, and all the URL punctuation, plus room to spare for space and an upper-case shift and a few more.
So ideally a QR encoder should ask the user whether the text is a URL, and should check which encoding is the smallest (alnum with %-encoding, or binary mode). Of course this assumes that any paths in the URL are also case-insensitive (which depends on the server).
Btw. can all HTTP servers correctly handle requests with uppercase domain names? I’m thinking about SNI, maybe certificate entries…? Or do browsers already “normalize” the domain name part of a URL into lowercase?
The spec says host names are case-insensitive. In practice I believe all (?) browsers normalize to lowercase so I’m not sure if all servers would handle it correctly but a lot certainly do. I just checked and curl does not normalize, so it would be easy to test a particular server that way.
Host name yes, but not path. So if you are making a URL that includes a path the path should be upper case (or case insensitive but encoded as upper case for the QR code).
No, a URI consists of ASCII characters only. A particular URI scheme may define how non-ASCII characters are encoded as ASCII, e.g. via percent-encoding their UTF-8 bytes.
Byte encoding is fun in practice, as I recently discovered, because Android’s default QR code API returns the result as a Java String. But aren’t Java Strings UTF-16? Why yes, and so the byte data is interpreted as UTF-8, then converted to UTF-16, and then provided as a string.
The work around, apparently, if you want raw byte data, is to use an undocumented setting to tell the API that the data is in an 8-bit code page that can be safely round tripped through Unicode, and then extract the data from the string by exporting it in that encoding.
I read somewhere that the EU’s COVID passes used text mode with base45 encoding because of this tendency for byte mode QR codes to be interpreted as UTF-8.
This causes a bit of headache for me. I doubled down on ring as the default backend for rustls when releasing ureq 3.0 just a couple of months ago. But this might mean I should switch to aws-lc-rs. Hopefully that doesn’t upset too many ureq users :/
There’s been some positive momentum on the GitHub discussion since you posted. Namely the crates.io ownership has been transferred to the rustls people and 2 of them have explicitly said they’ll maintain it. They need to make a new release to reflect the change and then will be able to resolve the advisory.
My dream for ureq is a Rust native library without C underpinnings. The very early releases of rustls made noises I interpreted to be that too, even though they never explicitly stated it being a goal (and ring certainly isn’t Rust native like that). rustls picked ring, and ureq 1.x and 2.x used rust/ring.
As I was working on ureq 3.x, rustls advertised they were switching their default to aws-lc-rs. However the build requirements for aws-lc-rs were terrible – like requiring users on Windows to install nasm (this has since been fixed).
One of ureq’s top priorities has always been to “just work”, especially for users new to Rust. I don’t want new users to face questions about which TLS backend to chose. Hence I stuck with rustls/ring for ureq 3.x.
aws-lc-rs has improved, but it is still the case that ring has a higher chance to compile on more platforms. RISCV is the one I keep hearing about.
Wait does that mean the Rust ecosystem is moving towards relying on Amazon and AWS for its cryptography? That doesn’t sound great. Not that I believe Amazon would add backdoors or anything like that, but I expect them to maintain aws-lac and aws-lc-rs to suit their own needs rather than the needs of the community. It makes me lose some confidence in Rust for these purposes to be honest.
I expect them to maintain aws-lac and aws-lc-rs to suit their own needs rather than the needs of the community
What do you see as the conflict here, i.e. where would the needs differ for crypto libraries?
I’d expect a corporate funded crypto project to be more likely to get paid audits, do compliance work (FIPS etc), and add performance changes for the hardware they use (AWS graviton processors I guess), but none of that seems necessarily bad to me.
Things like maintaining API stability, keeping around ciphers and cryptographic primitives which AWS happens to not need, accepting contributions to add features which AWS doesn’t need or fix bugs that doesn’t affect AWS, and improving performance on platforms which AWS doesn’t use are all things that I wouldn’t trust Amazon for.
People like the OP often assume that using more energy automatically means more pollution and environmental damage, but it’s because they are thinking about coal plants.
What really matters is how we generate that energy, not how much we use. Take countries that rely almost entirely on hydroelectric power, where turbines harness the natural force of falling water, or those powered by nuclear plants. In these cases, if we don’t use the available energy, it simply goes to waste. The energy will be generated whether we use it or not.
So when we’re discussing the ethics of training LLM, their energy consumption it has nothing to do with being ethical or unethical. The nature of the energy sources are a different topic of conversation, regardless on how much energy they use.
What really matters is how we generate that energy, not how much we use. Take countries that rely almost entirely on hydroelectric power, where turbines harness the natural force of falling water, or those powered by nuclear plants. In these cases, if we don’t use the available energy, it simply goes to waste. The energy will be generated whether we use it or not.
Okay, but (say) the OpenAI data centers are located in the US, and we know the energy mix in the US. For example this webpage of the U.S. Energy Information Administration currently states:
In 2023, utility-scale electric power plants that burned coal, natural gas, or petroleum were the source of about 60% of total annual U.S. utility-scale electricity net generation.
And even climate friendly alternatives like hydro require damming up rivers and destroying rivers, windmills require paving new roads into the wilderness and introduce constant noise and visual pollution, solar requires land and mined minerals, nuclear … well ok nuclear is probably fine, but it’s expensive. Not that coal is better than all these, but focus on energy consumption is absolutely warranted as long as there is no single climate and nature-friendly power source that is available everywhere.
(In Norway there has been massive protests against windmills because they destroy so much wild land, and politicians are saying “yes but we must prepare for the AI future, also Teslas, so turning our forests paving-gray is actually the green thing to do also we got some money from those foreign investors which we can make half a kindergarten out of”. And that in a country where we already export much of our electricity and the rest is mostly used for the aluminum industry precisely because power is already fairly cheap here due to having dammed most of our rivers long ago.)
I agree with much of this comment, but I have two quibbles:
hydro require damming up rivers and destroying rivers
One can draw off only a fraction of the water in a river and feed it to a turbine with or without a dam, although of course there are trade-offs.
solar requires land and mined minerals, nuclear … well ok nuclear is probably fine
Nuclear power, at least in practice, implies mining minerals too, specifically radioactive ones (on top of the mining of iron, copper, etc. implied by any electric infrastructure).
Article was rather more tolerable than I was expecting, so good on the author.
I will highlight one particular issue: there’s no way to have both an unbiased/non-problematic electric brain and one that respects users’ rights fully. To wit:
it feels self-evident that letting a small group of people control this technology imperils the future of many people.
This logic applies just as much to a minority pushing for (say) trans-friendly norms as it does for a minority pushing for incredibly trans-hostile ones. Replace trans-friendly with degrowth or quiverfull or whatever else strikes your fancy.
Like, we’ve managed to create these little electric brains that can help people “think” thoughts that they’re too stupid, ignorant, or lazy to by themselves (I know this, having used various electric brains in each of these capacities). There is no morally coherent way–in my opinion!–of saying “okay, here’s an electric brain just for you that respects your prejudices and freedom of association BUT ALSO will only follow within norms established by our company/society/enlightened intellectual class.”
The only sane thing to do is to let people have whatever electric brains they want with whatever biases they deem tolerable or desirable, make sure they are aware of alternatives and can access them, and then hold them responsible for how they use what their electric brains help them with in meatspace. Otherwise, we’re back to trying to police what people think with their exocortices and that tends to lose every time.
This is just free speech absolutism dressed up in science fiction. There are different consequences to different kinds of speech, and these aren’t even “brains”: they’re databases with a clever compression and indexing strategy. Nobody is or should be required to keep horrendous speech in their database, or to serve it to other people as a service.
Nobody is or should be required to keep horrendous speech in their database, or to serve it to other people as a service.
Isn’t that exactly the problem @friendlysock is describing? This is already a reality. One has to abide the American Copilot refusing to complete code which mentions anything about sex or gender and the Chinese Deepseek refusing to say anything about what happened at Tianenmen square in 1989.
The problem is powerful tech companies (and the governments under which they fall) imposing their morality and worldview on the user. Same is true for social media companies, BTW. You can easily see how awkward this is with the radically changed position of the large tech companies with the new US administration and the difference in values it represents.
It’s not “free speech absolutism” to want to have your own values represented and expressed in the datasets. At least with more distributed systems like Mastodon you get to choose your moderators. Nobody decries this as “free speech absolutism”. It’s actually the opposite - the deal is that you can join a system which shares your values and you will be protected from hearing things you don’t want to hear. Saying it like this, I’m not so sure this is so great, either… you don’t want everyone retreating into their own siloed echo chambers, that’s a recipe for radicalisation and divisiveness.
The problem is not the existence of harmful ideas. The problem is lack of moderation when publishing them.
And yeah, nobody should be required to train models in certain ways. But maybe we should talk about requirements for unchecked outputs? Like when kids ask a chatbot, it shouldn’t try to make them into fascists.
On the other hand, when I ask a chatbot about what’s happening in the US and ask it to compare with e.g. Umberto Eco’s definition of Fascism, it shouldn’t engage in “balanced discussion” just because it’s “political”.
We need authors to have unopiniated tools if we want quality outputs. Imagine your text editor refusing to write certain words.
This is just free speech absolutism dressed up in science fiction.
Ah, I guess? If that bothers you, I think that’s an interesting data point you should reflect on.
Nobody is or should be required to keep horrendous speech in their database, or to serve it to other people as a service.
Sure, but if somebody chooses to do so, they should be permitted. I’m pointing out that the author complains about bias/problematic models, and also complains about centralization of power. The reasonably effective solution (indeed, the democratic one) is to let everybody have their own models–however flawed–and let the marketplace of ideas sort it out.
In case it needs to be explicitly spelled out: there is no way of solving for bias/problematic models that does not also imply the concentration of power in the hands of the few.
Random sidenote: I wish there was standard shortcuts or aliases for frequently typed commands. It’s annoying to type systemctl daemon-reload after editing a unit, e.g. why not systemctl dr? Or debugging a failed unit, journalctl -xue myunit seems unnecessarily arcane, why not --debug or friendlier?
Well I guess fu rolls better of the tongue than uf. But I remember literally looking up if there isn’t anything like -f and having issues with that. Oh well.
I’m not sure it would be “clever”. At best it would make transactional changes (i.e. changes that span several files) hard, at worst impossible. It would also be a weird editing experience when just saving activates the changes.
I wonder why changes should need to be transactional? In Kubernetes we edit resource specs—which are very similar to systemd units—individually. Eventually consistency obviates transactions. I think the same could have held for systemd, right?
I wonder why changes should need to be transactional
Because the services sd manages are mote stateful. If sd restarted every service each moment their on-disk base unit file changes [1], desktop users, database admins, etc would have terrible experience.
I think most people misunderstand why SQLite on the server is so important:
The problem SQLite solves is providing a more advanced alternative to using something like partitioning keys in Cassandra.
For context:
At scale, companies often use databases that support partitioning keys to ensure that all data related to a single “entity” (e.g., a chat) is stored together for fast, efficient lookups. In Cassandra, a typical chat message store might look like this:
CREATE TABLE user_chat (
chat_id UUID, -- Partition Key: Groups all messages for a single chat
sent_at TIMESTAMP, -- Clustering Key: Orders messages within the chat
sender_id UUID, -- Sender of the message
message TEXT, -- Message content
PRIMARY KEY (chat_id, sent_at)
) WITH CLUSTERING ORDER BY (sent_at ASC);
This design ensures that all messages for a given chat are stored sequentially on the same node, enabling fast batch reads. However, it comes with major limitations:
Queries must align with the partitioning schema, making complex filtering or ad-hoc queries difficult.
You can’t easily enforce ACID guarantees across multiple chat_id partitions, making cross-chat operations cumbersome.
With SQLite, you can sidestep these limitations by creating a dedicated database per chat_id. This allows for:
Rich relational modeling – Each chat database can include tables for messages, participants, reactions, and metadata.
Local ACID transactions – SQLite provides full transaction support within the scope of a single chat.
Efficient I/O – SQLite is optimized for small, entity-scoped workloads and can operate with minimal overhead.
This approach turns SQLite into an effective solution for entity-local storage while avoiding the complexity of a full-scale distributed database like Cassandra.
+1 SQLite is a nifty storage engine at a good level of abstraction. It’s much higher level (and slower) than KV stores like RocksDB, but conversely vastly simplifies whatever storage architecture you need. I work on a service that builds secondary indexes as SQLite databases on-demand & maintains them via CDC from the Postgres source of truth. The success and expertise from that system has us considering other use cases for Sqlite as well.
I should clarify – there are many flavors of SQLite. e.g. Turso + D1, Litestream, rqlite, and mvsqlite are all very different. This article mentions Litestream specifically, which – you’re correct – does not apply to my comment.
I’m talking specifically about Turso & Cloudflare D1 where the place the database lives is abstracted away from you. This is similar to how partitioning keys in Cassandra are mapped to tokens belonging to machines, which is totally abstracted from you.
This is my first post in a series intended to draw programmers from outside the functional programming sphere into dependent typing, by showing off ways that dependent typing can be used to build ergonomic and misuse-resistant APIs in ways that would typically require macros in languages with less powerful type systems.
My target audience is still a bit vauge for this series, currently I’m targeting systems programmers that have decent experience with strongly typed languages, rust comes to mind since I’ve worked extensively in it personally, and has maybe tried haskell and gotten burnt by it.
Feedback is appreciated, I’m interested in hearing about how I can make posts like this more approachable to people with other backgrounds, and I’m also very interested in seeing examples of macros people would like to see disappear into the type system.
How strong is the compile time / run time phase separation in Idris?
Could this technique work with format strings that are chosen at run time, with the type system ensuring that the strings are compatible? (Assuming format strings come from a limited set that is fixed at compile time.)
For Idris 2, that’s a tricky one to answer extensively, I’ve got a whole several posts planned out explaining that. One of 2’s unique features among dependently typed languages, even counting Idris 1, is it’s integration of linear types, which are very similar to the affine types of rust’s ownership, but used for a totally different purpose of controlling the spill over between compile time and run time. Idris 2 has a rather expressive sub language, so to speak, for explicitly encoding the relationship between compile time and runtime.
For the second question, that wouldn’t be difficult, and I’ll probably getting around to covering something like that sooner or later, but basically you just need to write a function that “decides” (produces a proof of equality or a proof of inequality at runtime) whether or not the type signatures produced by two format strings are equal, and then you can pretty easily write up a procedure that reuses printfFmt to make a function that checks two format strings for compatibility at runtime, and then uses the types from the first to accept arguments for the latter. I may actually end up writing that as an addenda to this post.
I realized that the specific case you were asking about (where the format strings come from a limited set that is fixed at compile time) is actually basically trivial, so I’ve gone ahead and written a little addendum to the post showing that off.
I still think I’ll write another addenda later on that covers the case where the format string is generated at runtime and checked, at runtime, against a template format string for type compatibility, but that’s going to require some slight refactoring to make the implementation short enough to be suitable for tacking onto this blog post.
I like the sound of this, but note that it is disapproved here to have one’s own work be 25% or more of one’s submissions. Try to submit some other things that you find interesting before the next part of this series.
This is a common misconception of the rule. You can’t ignore the purpose of the rule and skip straight to the rule of thumb.
The purpose of the rule is:
Self-promotion: It’s great to have authors participate in the community, but not to exploit it as a write-only tool for product announcements or driving traffic to their work.
Writing high-quality educational content like this isn’t automatically “self-promotion” just because it was submitted by the author. It also doesn’t automatically become self-promotion based on ratios. Self-promotion is a type of spam, where spam is defined as frequent low-quality/unwanted content. The rule of thumb is just a way to detect the symptoms of self-promotion but cannot diagnose it alone.
Huh. For what it’s worth, I commented as I did because I figured someone would say it was against the self-promotion rule and I preferred to have the message delivered in what I thought would be a supportive, constructive way by a fellow admirer of dependent typing (me) rather than maybe in a more discouraging way by someone who disdains dependent typing.
“you now need, at the very least, backups” This is true of every database; how is this a criticism of sqlite? “If you want to run the service across multiple machines you can use LiteFS” It seems like the core use case for sqlite is situations where your data and worker all fit on one machine. I’m unfamiliar with LiteFS besides this blog’s description, but its description argues in favor of sqlite because it describes LiteFS like it’s either a solution to that limitation or a bridging strategy while you migrate to another database.
“Migrations are not great in SQLite.” This sounds like a real concern but “not great” is vague and it suggests you “search online” to understand the scope of the problem. Maybe this is a compelling limitation but this is an empty two sentences.
“Decoupling storage from compute is the default architecture” This is point 1 again.
“Migrating from SQLite is not incredibly hard, but it’s not easy, and it’s still pointless work.” This is point 1 again. It’s also wrong: if any dependency or strategy stops being the correct decision, it’s not at all pointless to rework it. “SQLite, by default, is not very strict with your data, like enforcing types or foreign key constraints” This is a useful criticism; it seems like every time sqlite comes up I see a link to another list of settings and I’d say reviewing the options is mandatory where the defaults on mariadb/postgresql are fine for most apps. And “very different latency profiles” is useful warning about that migration path; every app ends up depending on a db-specific feature but 1 + n queries is a pretty big one to rework.
I’m not well-informed enough to agree or disagree with this post, just frustrated that it doesn’t make the best case it could.
Re: migrations: SQLite doesn’t support some commonplace ALTER TABLE behavior. Instead, you have to:
Start a transaction
Create a new table with your NOT NULL column without default, a constraint, whatever
Copy data from the old table to the new table
Disable foreign key enforcement
Drop the old table
Rename the new table to the old name
Re-enable foreign key enforcement
For large tables this will lock the database for a while as the data copies over.
Additionally, this is an error-prone process. Do this enough times and eventually you might get the steps wrong. Even if you don’t, I had an ORM delete data on a small project when it executed statements in an unintuitive, undocumented, and nondeterministic order, running the DROP before running the statement disabling foreign keys. The ORM’s built-in schema commands weren’t susceptible to this issue, but SQLite required me to use a lower-level approach.
To me this does not sound much less nutso than mysql online schema change machinery or vitess or whatever. Postgres is more better about this, sure. But sqlite is not bad, all the DDL is fully supported in transactions and it has a handy slot for storing your schema version number. If you’re using an ORM that is shredding your data, like, don’t use that software, but it’s not SQLite’s fault.
Yeah I agree this post does not make a compelling case. I have my own reasons I don’t love SQLite (really limited builtin methods and a too-flexible grammar) and I work for a Postgres company but from reading this article alone I can’t see a coherent reason not to use SQLite on the server.
I can’t see a coherent reason not to use SQLite on the server.
Did you miss the part about need to share a disk if you want more than one application server? Granted, not all applications need more than one copy running…
I guess the other option is that each application server has their own sqlite and you use RAFT or something to drive consensus between them… which maybe works if you can have at least 3 of these servers, at the expense of extreme complexity. :)
Granted, not all applications need more than one copy running…
Surely that’s the important point? SQLite is great when everything fits on a single machine, which is really the case for the vast majority of projects [citation needed].
When something is big enough to need multiple machines then it’s time to switch to a separate storage layer running PostgreSQL.
Isn’t this all just a case of picking the right tool for your situation?
This is the point of LiteFS - it replicates writes to other boxes - no NFS needed (and iirc locks on nfs were unreliable so you probably dont want to run the database file on nfs).
You DO need to take care to send writes to the leader but Fly (where the author works), makes this pretty easy. LiteStream (same author) is similar but is more of a continuous back up tool.
Right. A thing you can do is make your architecture more complicated by introducing LiteFS, or LiteStream, but it’s no longer “just sqlite” at that point.
I’ve been running Litestream for a while now and it’s VERY straight-forward. You start the process and point it at a SQLite database, it inexpensively streams WAL blocks to S3 or similar, you can use those for recovery.
It’s so simple, robust and cheap that I see it as a selling point for SQLite as a whole - much less complex/expensive than achieving point-in-time replica backups with PostgreSQL or MySQL.
Yeah but again, these work with only a single instance at a time. If that’s your setup it’s great, but many people need multi-instance setups and mounting a shared SQLite file on NFS in some cloud provider is not a great idea.
But it isn’t an NFS in the cloud, it replicates the WAL which is then made available, yes via a fuse file system but it’s not NFS, it’s not really any different that read replicas for Postgres or MySQL and it seems to be a lot easier to get going. You DO have HA on your DB don’t you?
If you don’t think it works for you, that’s fine, but I think you should understand how it works before making that decision.
Yeah but the point is that a cloud vendor DB gives me read replicas out of the box, I don’t need to set it up manually. Added to the other points, this makes it kinda hard to argue against PostgreSQL for a typical backend app.
The same is as true with sqlite as with other dbs. With Turso/libsql or litefs you can have a read replica materialized on the local nvme ssd of every application server. I’m not arguing SQLite is a better option for every app like everything in engineering the answer is “it depends”. But it feels unfair to dismiss SQLite based architecture because the sqlite3.c source file doesn’t directly automate some need and dismiss the sqlite vendors, and then compare that to Postgres + unlimited ancillary vendor support & automation.
Turso is one vendor. Cloudflare is another but its SQLite offering has a fairly low max database size. Meanwhile, most cloud providers have mature support for MySQL/Postgres. If the cloud provider situation changes for SQLite in the future and it is offered more widely, then the analysis changes, but until then I believe it’s fair. And so do the folks who actually make SQLite.
Sure… if you’re using AWS, Azure or Google… and yeah, even fly.io (though theirs is NOT managed) does too, but really, it’s run a single executable in the background - replay or routing writes to the master is more work of course, but it’s still pretty easy. And given the stance of the US govt lately towards my country, I am likely NOT to use any of those, preferring to host domestically (sorry to bring politics into this - but everything is a trade off for making technical decisions, including that) .
I once set up HA Postgres on Linode (floating ip,NBD and the watchdog) and it worked, but was cargo culted and seemed very fragile in it’s complexity. LiteFS in comparison seems clean and elegant and would be very robust against SPOF and as a benefit move the db reads to the edge, nearer to my hypothetical users.
But like I said, I am not running anything using it at the moment. The one project I AM considering doing would use Postgres for other reasons.
every app ends up depending on a db-specific feature but 1 + n queries is a pretty big one to rework
I think there might be a few cases in which SQLite forces one to use multiple queries where PostgreSQL wouldn’t, but it doesn’t usually, does it?
From that linked page:
So, SQLite is able to do one or two large and complex queries, or it can do many smaller and simpler queries. Both are efficient. An application can use either or both techniques, depending on what works best for the situation at hand.
“the situation at hand” could include “I want to leave open the option of replacing the RDBMS”.
“you now need, at the very least, backups” This is true of every database; how is this a criticism of sqlite?
Well, the article is pretty clear in what it’s trying to say, quote:
The value of SQLite is that it’s infrastructure-less. You don’t have to run anything additional to use it.
Which is saying, many choose to use sqlite because they don’t need to run anything extra - they just need to link their code with sqlite and it works. And trying to use sqlite on server-side negates that benefit. So, if that’s not why you want to use sqlite to begin with, this point doesn’t affect you at all.
Do you know how they plan to make it sustainable? Moving a bunch of content to something else that will just go away or need to enshittify is not very appealing… it’d be more compelling if they outlined a plan for how they plan to support themselves.
I wondered that myself and saw nothing in briefly looking around, but in December they had a donation campaign progress bar that (to my surprise) at least came close to filling.
I briefly thought this was a thing I read about a few months ago called Weird Gloop which is an effort to rescue wikis from the horrible Fandom farm, but no, Miraheze is much bigger and much older.
I would imagine the main reason Fandom.com remains more popular than Miraheze (if it does; I don’t know) is that Fandom.com is more visible, but also I think Miraheze expects a significantly higher level of commitment from users: I think Miraheze wikis are required to have their own administrators to enforce Miraheze’s rules, whereas Fandom.com doesn’t seem to care much if a wiki has no admins, and Miraheze threatens to delete wikis that aren’t actively edited, although in practice they seem to grant many exemptions for wikis that are considered “complete” or that pay them, whereas Fandom.com doesn’t seem to care if a wiki is inactive. I don’t mean to object to Miraheze’s policies — I find it understandable that they would have higher standards and ask more of users than Fandom.com does — but I also find it understandable if someone who wants to start a wiki and write up some stuff without taking on much responsibility would prefer Fandom.com.
However, you said “Why aren’t more wikis moving to it from Fandom?”, which maybe implies large, established wikis that would have no problem meeting Miraheze’s requirements, so maybe this is irrelevant to your actual question.
Edit: Also, Fandom.com’s restrictions on discussions of leaving Fandom.com might be effective to some extent.
Miraheze sounds like this typical bad-at-marketing-floss-adjecent product.
The only big non-Wikipedia wiki I was ever part of was gaming-related and (afaik) not known by programmers, so the people just “look elsewhere”, and in this case ownership changed a couple times as well. And I don’t know if Miraheze was around 20y ago…
I understand the displeasure of Jimbo’s face (although it’s been some years since that), or the question if donations are financially necessary (unsure for Wikimedia), but a yearly, dismissable donation banner to me still seems much preferable than constant daily ads.
For sure. If all the fandom / wikia communities would pick up and move here, that would be a better world, and they’re likely to stay afloat with only low-pressure non-intrusive solicitation.
But, for myself, if I’m starting a wiki, I’d rather self-host it. I don’t trust all these experts and their infra.
I disagree with this, only because it’s imperialism. I’m British, in British English I write marshalling (with two of the letter l), sanitising (-sing instead of -zing except for words ending in a z), and -ise instead of -ize, among other things. You wouldn’t demand an Arabic developer to write all his comments in English for your sake for the sake of being idiomatic, would you?
I’ve worked for a few companies in Germany now, about half of them with their operating language being in German. All of them had comments being written exclusively in English. I don’t know how that is in other countries, but I get the impression from Europeans that this is pretty standard.
That said, my own preference is for American English for code (i.e. variable names, class names, etc), but British English for comments, commit messages, pull requests, etc. That’s because the names are part of the shared codebase and therefore standardised, but the comments and commit messages are specifically from me. As long as everyone can understand my British English, then I don’t think there’s much of a problem.
EDIT: That said, most of these suggestions feel more on the pedantic end of the spectrum as far as advice goes, and I would take some of this with a pinch of salt. In particular, when style suggestions like “I tend to write xyz” become “do this”, then I start to raise eyebrows at the usefulness of a particular style guide.
Developers in China seem to prefer Chinese to English. When ECharts was first open-sourced by Baidu most of the inline comments (and the entire README) were in Chinese:
In Japan I feel like the tech industry is associated with English, and corporate codebases seem to use mostly English in documentation. However, many people’s personal projects have all the comments/docs in Japanese.
If someone wants to force everyone to spell something the same within a language they should make sure it’s spelled wrong in all varieties, like with HTTP’s ‘referer’.
The Go core developers feel so strongly about their speling that they’re wiling to change the names of constants from other APIs.
The gRPC protocol contains a status code enum (https://grpc.io/docs/guides/status-codes/), one of which is
CANCELLED. Every gRPC library uses that spelling except for go-grpc, which spells itCanceled.Idiosyncratic positions and an absolute refusal to concede to common practice is part and parcel of working with certain kinds of people.
We’re drifting off-topic, but I have to ask: gRPC is a Google product; Go is a Google product; and Google is a US company. How did gRPC end up with
CANCELLEDin the first place?!When you use a lot of staff on H-1B and E-3 visas, you get a lot of people who write in English rather than American!
Wait until you hear about the HTTP ‘Referer’ header. The HTTP folks have been refusing to conform to common practice for more than 30 years!
If this is something other than a private pet project of a person who has no ambition of ever working with people outside of his country? Yes, yes I would.
I believe the advice is still applicable to non-native speakers. In all companies I worked for in France, developers write code in English, including comments, sometimes even internal docs. There are a lot of inconsistencies (typically mixing US English and GB English, sometimes in the same sentence.)
In my experience (LatAm) the problem with that is people tend to have pretty poor English writing skills. You end up with badly written comments and commit messages, full of grammatical errors. People were aware of this so they avoided writing long texts in order to limit their mistakes, so we had one-line PR descriptions, very sparse commenting, no docs to speak of, etc.
Once I had the policy changed for the native language (Portuguese) in PRs and docs they were more comfortable with it and documentation quality improved.
In Europe people are much more likely to have a strong English proficiency even as a second or third language. You have to know your audience, basically.
While I like to write paragraphs of explanation in-between code, my actual comments are rather ungrammatical, with a bit of git style verb-first, removing all articles and other things. Proper English feels wrong in these contexts. Some examples from my currently opened file:
Those comments would most likely look weird to a person unfamiliar with your particular dialect.
In a small comment it’s fine to cut some corners, similar to titles in newspapers, but we can’t go overboard: the point of these things is to communicate, we don’t want to make it even more difficult for whoever is reading them. Proper grammar helps.
For clarification, this is not my dialect/way of speaking. But I see so many short interline comments like this, that I started thinking they feel more appropriate and make them too, now. Strange!
“If you use standard terms, spell them in a standard way” is not the same as “use only one language ever”.
Is “chapéu” or “hat” the standard way of spelling hat in Golang? If it’s “hat”, your standard is “only use American English ever”.
Is “hat” a standard term regularly used in the golang ecosystem for a specific thing and on the list given in the article? If not, it is not relevant to the point in the article.
(And even generalized: if it happens to be an important term for your code base or ecosystem, it probably makes sense to standardize on how to spell it. in whatever language and spelling you prefer. I’ve worked on mixed-language codebases, and it’d been helpful if people consistently used the German domain-specific terms instead of mixing them with various translation attempts. Especially if some participants don’t speak the language (well) and have to treat terms as partially opaque)
What? England had the word “hat” long before the USA existed.
I had to solve this once. I maintain a library that converts between HTML/CSS color formats, and one of the formats is a name (and optional spec to say which set of names to draw from). HTML4, CSS2, and CSS2 only had “gray”, but CSS3 added “grey” as another spelling for the same color value, and also added a bunch of other new color names which each have a “gray” and a “grey” variant.
Which raises the question: if I give the library a hex code for one of these and ask it to convert to name, which name should it convert to?
The solution I went with was to always return the “gray” variant since that was the “original” spelling in earlier HTML and CSS specs:
https://webcolors.readthedocs.io/en/latest/faq.html#why-does-webcolors-prefer-american-spellings
I thought you guys loved imperialism?
Imperialism is like kids, you like your own brand.
I don’t think it’s really “imperialism”—firstly, “marshaling” isn’t even the preferred spelling in the US. Secondly in countries all over the world job listings stipulate English language skills all the time (even Arabic candidates) and the practice is widely accepted because facilitating communication is generally considered to be important. Lastly, while empires certainly have pushed language standardization as a means to stamp out identities, I don’t think it follows that all language standards exist to stamp out identities (particularly when they are optional, as in the case of this post).
What makes you say that? (Cards on the table, my immediate thought was “Yes, it is.” I had no data for that, but the ngram below suggests that the single l spelling is the (currently) preferred US spelling.)
https://books.google.com/ngrams/graph?content=marshaling%2Cmarshalling&year_start=1800&year_end=2022&corpus=en-US&smoothing=3&case_insensitive=true
It’s imperialist to use social and technical pressure to “encourage” people to use American English so their own codebases are “idiomatic”.
I disagree. I don’t see how it is imperialism in any meaningful sense. Also “pressure” is fairly absurd here.
I wasn’t going to comment, but since others have — initially I wasn’t sure if this was a joke, with “features” like “No package manager to distract you” and “Simple type system … everything is an int”.
I mean, I understand it’s in early development, but the website seems to be overpromising and underdelivering. If it were my language I’d have waited until it was farther along before showing it off.
I do think some “features” on here are rather ridiculous e.g. all types are ints, but I do consider the lack of a package manager to be a feature. IMO package management should be delegated to my system package manager or something like Guix/Nix
Coming from a C++ background, I can say a package manager is much, much, much to be desired. A system package manager is not the same thing at all.
Err it’s probably dependent on the preferred workflow. Not sure how the C++ ecosystem works but I vastly prefer just using apk (sometimes guix) in C
This means that your project will be unportable not only between operating systems, but between distros.
I think Nix is about as portable between Linux distros as, say, Cargo. AFAIK Nix doesn’t work on Windows, but a new monolingual package manager very well might not either.
Markdown files describing what you need a relatively portable, just that it’s between humans rather than package managers. Some dislike this for good reason, others dont
Parts of the readme are certainly a joke, parts are descriptions of where the implementation is now, parts are things I’d like to do or have done at some point. I certainly wasn’t intending to show it off.
A great way to make smaller and maybe-more-recognizable (or at least more-easily-localized) icons less opaque is tooltips. …Oh right, you can’t have those on touch devices. Welp.
I actually wrote a draft for a blog a couple days ago on tooltips but it isn’t ready to go live yet… the short version is tooltips kinda suck:
Among a few other things. I’d rather have both visible. One alternative for the toolbar is that every tool bar icon should also be visible in the main menu…. the icon and label could be together in the menu, allowing you to scan it and learn the icons, then the shortcut on the toolbar is more usable as you get to know it.
Using standard icons for standard operations is nice too. Old style open, close, save, print, standard icons and behaviors are getting depressingly less common, sigh.
Why not? The Google apps on my Android phone have tooltips, triggered by pressing and holding a button.
A good “falsehoods” list needs to include specific examples of every falsehood.
Yours doesn’t! And I maintain that it’s still a good
boylist.That doesn’t look to me like it’s meant to be an example of a good falsehoods list.
My point stands. Dogs.
In addition, it’s worth knowing that dogs up to 2 years of age exhibit the halting problem.
I’ll make an attempt, with the caveat that this list seems so obvious to me that I’m worried I might be missing some nuance (imagine a similar list about cooking utensils with “people think knives can only be used for butter, but in reality they can also be used to cut bread, meat, and even vegetables!!!”).
Both the substitutions and the surrounding text can depend on each other. The obvious example is languages where nouns have gender, but you might also have cases like Japanese where “in” might be へ, で, or に to indicate relative precision of the location.
German is the classic example of using lengthy compound words where English would use a shorter single-purpose word, “Rindfleisch” vs “beef” or “Lebensmittel” vs “food” (why yes I haven’t had lunch yet, why do you ask…?).
See above – English -> German tends to become longer, English -> Chinese tends to become shorter.
Turkish and German are famous counter-examples, with Turkish
'i'/'I'being different letters, or German ß capitalizing to"SS"(though I think this is now considered somewhat old-fashioned?).Not true in Chinese, Japanese, Korean.
Every language has words that don’t exist in any other language. Sometimes because the concept is alien (English has no native word for 寿司), sometimes because a general concept has been subdivided in a different way (English has many words for overcast misty weather that don’t translate easily into languages from drier climates).
I’m not sure what this means because many expressions in English don’t even have a single explanation in English, but in any case, idioms and double entendres often can’t be translated directly.
If one’s English to SVO order is limited, limited too must their knowledge of literature be.
Even English doesn’t follow a rule of capitalizing the first character of every word. Title Casing The First Letter Of Every Word Is Bad Style.
One well-known counter-example being languages where agreement is by repeating a verb:
A: “Do you want to eat lunch together?” B: “Eat.”
See above.
Color / colour, aluminum / aluminium
Not sure exactly what this means – upper-case vs lower-case? Latin vs Cyrillic? 漢字 vs ひらがな カタカナ ? 简化字 vs 繁体字 ? Lots of counter-examples to choose from, Kazakh probably being a good one.
Lithuanian sorts
'y'between'i'and'j': https://stackoverflow.com/questions/14458314/letter-y-comes-after-i-when-sorting-alphabeticallySome languages special-case ordering of letter combinations, such as
ijin Dutch.And then there’s the dozens of European languages that have their own letters outside the standard 26. Or diacritics.
Arabic, Hebrew.
Not sure what “flows” means here, but applications with good RtL support usually flip the entire UI – for example a navigational menu that’s on the right in English would be on the left in Arabic.
Chinese, Japanese.
English:
"Dear Mr. Smith".French famously has rules that differ from English regarding spacing around punctuation.
“ ” in English,「 」in Japanese, « » in French,
European languages that use
'.'for thousands separator and','for the fractional separator, or languages that group by different sizes (like lakh/crore in Indian languages).Many languages are considered distinct for political reasons, even if a purely linguistic analysis would consider them the same language.
English (as spoken in Pittsburgh), English (as spoken in Melbourne), and English (as spoken in Glasgow).
More seriously, Japanese and Javanese.
Often they’re difficult to understand even for English speakers (I once saw a literal hamburger used to signify a collapsable sidebar).
Nobody who has ever travelled would think this. And yet. AND YET!
C’mon Google, I know that my IP is an airport in Warsaw but I really don’t want the Maps UI to switch to Polish when I’m trying to find a route to my hotel.
You can roughly gauge where you are in the world by whether the local ATMs offer “🇬🇧 English”, “🇺🇸 English”, or “🇦🇺 English”.
Belgium, Luxembourg, Switzerland.
English, again.
The German ß has history.
The old rule is that ß simply has no uppercase. Capitalizing it as “SS” was the default fallback rule if you had to absolutely capitalize everything and the ß would look bad (such as writing “STRAßE” => “STRASSE”). Using “SZ” was also allowed in some cases.
The new rule is to use the uppercase ß: ẞ. So instead of “STRASSE” you now write “STRAẞE”.
The usage of “SZ” was disallowed in 2006, the East Germans had an uppercase ß since 1957, the West German rules basically said “Uppercase ß is in development” and that was doppred in 1984 for the rule to use SS or SZ as uppercase variant. The new uppercase ß is in the rules since 2017. And since 2024 the uppercase ß is now preferred over SS.
The ISO DIN 5008 was updated in 2020,
This means depending on what document you’re processing, based on when it was created and WHERE it was created, it’s writing of the uppercase ß may be radically different.
It should also be noted that if you’re in Switzerland, ß is not used at all, here the SS substitute is used even in lower case.
Family names may also have custom capitalization rules, where ß can be replaced by SS, SZ, ẞ or even HS, so “Großman” can become “GROHSMANN”. Note that this depends on the person, while Brother Großmann may write “GROHSMANN”, Sister Großmann may write “GROSSMANN” and their mother may use “GROẞMANN” and these are all valid and equivalent.
Umlauts may also be uppercased without the diacritic umlaut and with an E suffix; ä becomes “AE”. In some cases even lowercase input does the translation because older systems can’t handle special characters, though this is not GDPR compliant.
If you ever want to have fun, the politics and regionality of German dialects could be enough to drive some linguists up the wall.
Bavarian is recognized as a language and dialect at the same time, it can be subdivided into dozens and dozens of subdialects, which are all similar but may struggle to understand eachother.
As someone who grew up in Swabian Bavaria, my dialect is a mix of both Swabian and Bavarian, I struggle to understand Northern Bavaria but I struggle much less with Basel Swiss Germany (which is distinct from Swiss German in that it originates from Lower Allemans instead of Higher Allemans) which is quite close in a lot of ways.
And the swiss then double down on making things confusing by sometimes using french language constructs in german words, or straight up importing french or italian words.
What should I read to learn more about this? Why wasn’t the character in Unicode 1.0, then?
East Germany added the uppercase ß in 1957 and removed it in 1984. The spelling rules weren’t updated, so despite the presence of an uppercase ß, it would have been wrong to use it in any circumstances. Since Unicode 1.0 is somewhere around 1992, with some early drafts in 1988, it basically missed the uppercase ß being in the dictionary.
The uppercase ß itself has been around since 1905 and we’ve tried to get it into Unicode since roughly 2004.
Is this more like there being an attested occurrence in a particular dictionary in East Germany in 1957 rather than common usage in East Germany?
A good example of this is a CMS I used to work on. The way it implemented translation was to define everything using English[0], then write translations as a mapping from those English snippets to the intended language. This is fundamentally flawed, e.g. by homonyms:
Here, the “Read” flag means “this has been read”, whilst the “Read” button means “I want to read this”. Using the English as a key forces the same translation on both.
[0] We used British English, except for the word “color”; since we felt it was better to match the CSS keywords (e.g. when defining themes, etc.).
One trick is to use a different word on the asset: Reviewed(adj) and Review(v) don’t have the same problem that Read(adj) and Read(v) do. Seen(adj) and See(v); Viewed(adj) and View(v). And so on. Then you can “translate” to English to actually use Unread/Read/[Read] if you still like it without confusing the translator who need to know you want more like e.g. Lido/Ler or 阅读/显示 and so on.
Much better than the original article. Also love how many of the counter examples come from English.
My bar for these lists is https://yourcalendricalfallacyis.com/ and most “falsehoods programmers believe” lists don’t meet it.
The number of exceptions caused by the Hebrew calendar makes me shed a tear of joy.
Here’s one falsehood they missed: the length of a year varies by at most one day. True in Gregorian calendar, apparently true in the Islamic calendar, but not true in the Hebrew calendar: leap years are 30 days longer than regular years.
They sorta cover it on the “days” section, by way of mentioning that the Hebrew calendar has leap months.
They also miss Byzantine calendars which are still used by many churches, related to the Jewish Greek calendar from the Septuagint. It’s of course complicated by the fact that many churches & groups do not agree on what year was the start, so it’s complex to use (but still in somewhat fairly common liturgical use).
Wow 30? I need to red more about this
Here’s a fun (counter)example of (something like) this one from my heritage language:
(Context: the word for enjoy/like is the same in the language, so when translating to English, I choose whichever sounds most natural in each given example sentence.)
When someone says, “do you (enjoy/)like it?”, if you want to say “yes, I like it”, that’s fine, but if you want to say you don’t like it, you would say, “I don’t want it”; if you were to say, “I don’t like it” in that situation, it would mean, “I don’t want it”. The same reversal happens if they ask, “do you want it?”, and you want to respond in the negative.
So someone would say, “do you want a chocolate bar?”, and you’d say, “no, I don’t want it”, and that would mean, “no, (I already know) I don’t (usually/habitually) enjoy it (when I have it), (therefore I don’t want it)”, whereas, “no, I don’t enjoy it” would just straightforwardly mean, “I don’t want it”.
(You can also respond with just, “no!” instead of using a verb in the answer.)
This only happens in the habitual present form. Someone might ask, “do you like chocolate?” before they offer you some, and you can say, “no, I don’t want it”, but if they already gave you a chocolate bar to try, they may ask, “did you like it?” in the past tense, and you’d have to respond with, “I didn’t like it” instead of, “I didn’t want it”. And, “do you want chocolate?” would be met with, “no, I don’t like it”, but “did you want chocolate?” would be met with, “no, I didn’t want it”, and that second one would just mean what it straightforwardly sounds like in English.
(Strictly speaking, it doesn’t need to be a response to a question, I’m just putting it into a context to show that the verb used in the answer isn’t just a negative form of the same verb used in the question.)
(It’s hard to explain because if you were to translate this literalistically to English, it wouldn’t even be noticed, since saying, “no, I don’t like it” in response to, “do you want it?” is quite natural, but does literally just mean, “I don’t like it”, in the sense of, “no, (I already know) I don’t (usually/habitually) enjoy it (when I have it), (therefore I don’t want it)”. Even, “no, I don’t want it“ in response to, “do you like it?” is fairly natural in English, if a little presumptive-sounding.)
In Polish when someone asks you “Czy chcesz cukru do kawy?” (“Do you want coffee with sugar?”) and you can respond with “Dziękuję”, which can mean 2 opposite things “Yes, please” or “No, thank you”.
The original ones, like “…Names”, don’t; part of what I find fun about them is trying to think of counterexamples.
I think if you want them to be useful they need to include counterexamples. If it’s just a vent post then it’s fine to leave them.
The first one gets a pass because it was the first one, and even then, I think it’s better to link people one of the many explainers people wrote about it.
The other thing this doesn’t mention is that if you have LSP set to auto-enable when you visit a file, you cannot even look at the file in your text editor for an untrusted project, which presents some obvious difficulties given that the means by which you decide if a project is trusted typically involves opening it in your text editor.
Some editors have started asking if a particular repository is trusted before running such commands due to this issue.
I think this is better than nothing, but if it’s a prompt that happens every time you open a new repo, it’s very likely that users will get trained to click OK without reading the message, to the point where the prompt doesn’t really accomplish much more than covering the ass of the editor maintainers.
That’s why I have Vim for code editing and ripgrep+less for inspecting untrusted code. Clean separation between the programmable text editor with plugins and the stuff for inspecting untrusted stuff.
I would trust a programming text editor more than less not to be tricked by things like Trojan Source.
GNU Emacs core tries to be careful about not making simply opening an untrusted file an attack vector, but I’m sure not all plugins are as careful.
In the past year, I have moved to nearly-entirely mouseless macOS.
Is there a way to make the native dialog boxes (typically an OK / Cancel sort of thing) keyboardable? That’s always what drove me nuts about MacOS. Just give me tab select and enter to trigger! I don’t understand why they’ve never adopted that.
My Mac experience doesn’t extend past Mac OS X, but that has/had Enter for OK and I think Space for Cancel. Do those no longer work?
There is an accessibility toggle somewhere that makes them (mostly) keyboard accessible. I can’t remember where it is exactly, but knowing that it exists is half the battle! ;)
Found it! System Settings -> Accessibility -> Keyboard -> Full Keyboard Access
Thank you, this is great. It’s even surprisingly configurable.
Always the first thing I go turn on on new macs!
Yes, this is huge. Forgot it above as I’ve been doing it long before any of those other tools (except maybe LaunchBar, since 2003 probably).
I was excited to try this out, but discovered that it prevents the space bar from working in kitty. I’m sure there is a way to fix that, but I wanted to leave a note here in case anyone else might try this out and then a hour later wonder why their space bar isn’t working in their terminal.
You can click the little ( i ) icon in the top right and change some of the bindings. I switched “activate” from space bar to what Apple’s always called “Return” (although I thought The Mac is not a Typewriter).
But I ended up turning it all off anyway, because the UX was so terrible and got in the way of everything. I only wanted keys for standard buttons on standard dialog boxes! I wish you better luck than I.
I still don’t understand why pledge(2)/unveil(2) have not been ported to other kernels. They have proven to work very well for a almost a decade now, as they were introduced in 2015.
SerentityOS [1] [2] has ported pledge+unveil. Reply thread here mentions NanoVMS [3] too.
But i agree. It’s a really easy-to-use, simple protection that seems like a no-brainer for pretty much everything to have now.
1: https://man.serenityos.org/man2/pledge.html
2: https://awesomekling.github.io/pledge-and-unveil-in-SerenityOS/
3: https://nanovms.com/dev/tutorials/applying-sandbox-security-node-js-unikernels-openbsd-pledge-unveil
We’re doing a capstone project with one of my students where we’re porting pledge to FreeBSD. We’re not doing unveil because I don’t know the internals as good as pledge, but hopefully something good comes out of this experiment.
You might want to reach out to some of the HardenedBSD community members. We have a port of pledge that is nearing completion and will likely be merged into HardenedBSD on the inside of a year, quicker if we have help. :-)
The Linux approach, for better or worse, is to provide more generic/complex kernel APIs, the two most similar to pledge/unveil being respectively seccomp-bpf and landlock. Given those, you can port pledge/unveil in userspace. But that obviously results in less uptake than an API blessed/mandated by the OS developers in the OpenBSD style.
edit: Although see previous discussion for more caveats.
From the porting link:
This seems pretty easy on Linux with systemd, am I missing something? The program itself doesn’t even have to know about any control mechanisms like
pledgeor whatever, we can enforce it from outside.pledgeandunveilare specifically designed as tools for the programmer. They aren’t meant to be imposed from the outside as a sandbox. Theo has repeatedly rejected the idea of adding a utility to OpenBSD that runs an arbitrary command line under a particular pledge, for example. The intended use is for the developer (or knowledgeable porter) to pledge/unveil down to the resources that they know the program needs. It’s a tool for defensive programming: if your pledged process gets owned, it has as little ambient authority as possible.I agree. The programmer knows better than the user what resources a program needs.
Besides, the external approach imposes a static set that stays the same during the entire runtime, which is sort of critical, mostly because many programs require more privileges during the initialization which can then be dropped later.
I agree, but OTOH the user has the much greater incentive to care about sandboxing programs than their programmers (when the user and the programmer are not the same).
OK but my point is that the linked post says that this functionality is extremely difficult in Linux, but if we shift our thinking slightly to work with Linux’s execution resource controls, it seems to be quite easy.
It…depends. Linux sandboxing solutions are not particularly unified and depend on stitching a lot of pieces together, sometimes with confusing or bizarre edge cases. It’s not awful, but it’s still a fair bit of effort and a surprising amount of room for error.
(I will say that Landlock helps a lot here by being a relatively high-level set of access controls for things that are otherwise more difficult to sandbox.)
No, it really looks quite simple. Check the link I posted earlier. It’s as simple as adding a couple of lines to the systemd unit file or a couple of flags to the
systemd-runcommand line.There’s a significant structural difference between the two. I honestly don’t know why the Internet keeps lumping them together. Yes, both apply restrictions, but they’re completely different mechanisms. It’s like lumping email and keyloggers together because both ultimately boil down to delivering a series of keypresses to a third-party.
pledgeandunveilare things that the program does, which has two major implications:pledgeing a particular restriction is a supported mode of operation.pledgekills the program withSIGABRTand you get a stack trace pointing to wherever the promise was broken.pledgeorunveilwhenever you want during a program’s execution (but there are restrictions to what you can “unpledge” or “un-unveil”). So you can e.g. start with full access rights to the user’s home path in order to read configuration, and you can drop filesystem access privileges entirely once you’ve read configuration data.“External” sandboxing solutions allow you to apply the same restrictions, but:
IPAddressDeny=an address, but if sending telemetry to that address is a blocking operation, you’ll just get a blocked program. That’s obviously not on the sandboxing method itself, but…IPAddressDeny=packets just get dropped.The whole point of
pledgeandunveilis that you write your program so that you make certain promises about how it’s going to work, and sandboxing works as protection against accidentally breaking those promises, either straight as a bug, or through a bug getting exploited. They’re design-by-contract tools, more or less.I’m a bit confused, I wouldn’t expect “Linux’s execution resource controls” to include systemd-specific APIs? Tbc these toggles do not cleanly map to OS primitives at all; you can’t just go “ah yes I want systemd’s IP address restrictions myself” without writing your own BPF filtering.
If you did just mean systemd specifically, note that it’s quite a bit trickier to, say, sandbox things in a flexible way with that. In particular, to sandbox something that’s not a service with systemd-run has the overhead of spawning the child under the service manager and piping the output, and you lose the ability to easily control and kill the child (because you’re just killing the proxying process, not the actual child).
Yes, that’s what I meant.
But the whole point of systemd is that it makes it easy to control and kill the child processes that it runs…
Yes but these things are happening on different levels and mechanisms.
One is baked into the software by design, the other is not.
systemd is just flat out not a replacement for pledge+unveil, end of story. Completely different mechanisms.
I didn’t say it’s a ‘flat out replacement’, I said it achieves the same goals that were mentioned in the linked post ie preventing a piece of software I downloaded off the internet from ‘phoning home’ and uploading telemetry.
Okay I understand now.. The way I see it, the example in that post was a one-off example, and a pretty bad one, because I think from this you’ve misunderstood the purpose of pledge. While there may be some overlap in the functionaility of BPF filters they are far, far, from 1-1.
To help clear this up:
So in this scenerio of telemetry:
There would be no point in using pledge here.
Yeah exactly, the overlap is what I’m talking about. This is what I was referring to earlier–if you shift your mindset a bit, it becomes pretty easy to externally impose similar security restrictions using systemd. If you don’t mind that the exact way it’s done is a bit different, you can achieve very similar goals.
Part of the pledge/unveil model is that you can progressively drop privileges during program execution. The observation is that programs may need a lot of privileges in order to start themselves up, and then hardly any for the remainder of runtime. For example, consider the recommendations for using pledge when opening handles to sndiod in OpenBSD:
However:
And in fact
aucat(1)does exactly this, using a bunch of privileges at startup time (but dropping other very sensitive promises likeprocandexecright away) and then dropping most of them when it’s time to start playback or recording from the sound device handle.The purpose of pledge is for the programmer to mitigate the privileges available to exploits, not for the user to defend against code they chose to run. As far as I know, these goals, which are the main point of
pledgeandunveil, do not overlap at all with what you can do by filtering access via systemd.They overlap with the goal that was stated in the linked post. That was what I was referring to. I agree that dropping complex combinations of privileges in Linux at different points within an application’s lifecycle doesn’t seem easy.
I encourage you to read the rest of my messages in this thread to get the fuller context. It’s better than responding to one message at a time: https://lobste.rs/s/ny2s9f/openbsd_innovations#c_bds2yk
Yeah but the linked post misunderstands what pledge is actually for; “sandboxing untrusted binaries” is not an actual design goal of pledge and if it’s useful for that goal it’s only useful by accident.
Yeah so this is kinda weird: it makes it easy to manage the child as a single entity and then interactively control it. But systemd-run is, ofc, just a proxy, spawning the child as a transient unit and monitoring it until exit.
This effectively means that systemd-run does not itself control the spawned process’s lifetime. When you run it with –pty, Ctrl-C will kill the child normally… because it’s going through the tty. But if you’re running it detached from the tty (e.g. because you need to control stdout/stdin from your program), or you just signal the process directly, the actual service will remain running in the background. The only way of directly controlling that lifecycle is by grabbing the PID or (preferably) transient unit name and then targeting it directly or controlling it via the D-Bus interface.
But it’s a transient unit, it shows up in the output of
systemctl list-units, right? That means you can operate on it with egsystemctl stop the-unitjust like other systemd units?It is, yeah, which is why I had mentioned the “it makes it easy to […] interactively control it”. It’s just a lot harder in turn to automatically do things like “oh the user did Ctrl-C, let me stop all my processes”.
How does pledge/unveil compare to Capsicum?
From my admittedly passing acquaintance with both of them: Capsicum is much finer grained.
Pledge/unveil solves the 80% use case, Capsicum aims to solve the 100% use-case.
I.e. it would be very difficult, perhaps impossible to implement capsicum in pledge/unveil, but easy enough to do with capsicum.
That said, there is probably more veil/pledge actually implemented in binaries out in the wild. I think all of the base OpenBSD system is done now. Last I checked only some of FreeBSD has Capsicum implemented in base.
We ended up porting this to Nanos: https://nanovms.com/dev/tutorials/applying-sandbox-security-node-js-unikernels-openbsd-pledge-unveil .
If encoding of typical URLs doesn’t work really well, shouldn’t they make a new encoding version that is specialized for alphanumeric, / : ? & etc?
chicken-egg problem now. Who would want to use a QR code encoding that won’t work with the majority of QR code readers for only a very small gain, and how many reader implementations are actively maintained and will add support for something nobody uses yet?
The encoding could be invented for internal use by a very large company, kinda like UPS’s MaxiCode: https://en.m.wikipedia.org/wiki/MaxiCode
What you’re describing is the problem for all new standards. How do they ever work? ;-)
Better in environments that are not as fragmented and/or can provide backwards compatibility? ;)
The “byte” encoding works well for that. Don’t forget, URls can contain the full range of Unicode, so a restricted set is never going to please everyone. Given that QR codes can contain ~4k symbols, I’m not sure there’s much need for an entirely new encoding.
Yes, although… there’s some benefit to making QR codes smaller even when nowhere near the limits. Smaller ones scan faster and more reliably. I find it difficult to get a 1kB QR code to scan at all with a phone.
QR codes also let you configure the amount of error correction. For small amounts of data, you often turn up the error correction which makes them possible to scan with a very poor image, so they can often scan while the camera is still trying to focus.
IME small QR codes scan very fast even with the FEC at minimum.
URIs can contain the full Unicode range, but the average URL does not. Given that’s become a major use case for qrcodes it’s definitely a shame it does not have a better mode for them: binary mode needs a byte per byte while alnum only needs 5.5 bits per byte.
All non-ascii characters in URLs can be %-encoded so unicode isn’t a problem. A 6-bit code has room for lower case, digits, and all the URL punctuation, plus room to spare for space and an upper-case shift and a few more.
So ideally a QR encoder should ask the user whether the text is a URL, and should check which encoding is the smallest (alnum with %-encoding, or binary mode). Of course this assumes that any paths in the URL are also case-insensitive (which depends on the server).
Btw. can all HTTP servers correctly handle requests with uppercase domain names? I’m thinking about SNI, maybe certificate entries…? Or do browsers already “normalize” the domain name part of a URL into lowercase?
The spec says host names are case-insensitive. In practice I believe all (?) browsers normalize to lowercase so I’m not sure if all servers would handle it correctly but a lot certainly do. I just checked and curl does not normalize, so it would be easy to test a particular server that way.
Host name yes, but not path. So if you are making a URL that includes a path the path should be upper case (or case insensitive but encoded as upper case for the QR code).
No, a URI consists of ASCII characters only. A particular URI scheme may define how non-ASCII characters are encoded as ASCII, e.g. via percent-encoding their UTF-8 bytes.
Ok? You see how that makes the binary/byte encoding even worse right?
Furthermore, the thing that’s like a URI but not limited to ASCII is a IRI (Internationalized Resource Identifier).
You’re right (oh and I should know that URLs can contain anything…. Let’s blame it on Sunday :-))
Byte encoding is fun in practice, as I recently discovered, because Android’s default QR code API returns the result as a Java String. But aren’t Java Strings UTF-16? Why yes, and so the byte data is interpreted as UTF-8, then converted to UTF-16, and then provided as a string.
The work around, apparently, if you want raw byte data, is to use an undocumented setting to tell the API that the data is in an 8-bit code page that can be safely round tripped through Unicode, and then extract the data from the string by exporting it in that encoding.
I read somewhere that the EU’s COVID passes used text mode with base45 encoding because of this tendency for byte mode QR codes to be interpreted as UTF-8.
Do you really mean base45 or was that a typo for base64? I’m confused because base64 fits fine into utf8’s one-byte-per-character subset. :)
There’s an RFC: https://datatracker.ietf.org/doc/rfc9285/ and yes, base45, which is numbers and uppercase and symbols that matches the QR code “alphanumeric” set.
ahhh ty
This causes a bit of headache for me. I doubled down on ring as the default backend for rustls when releasing ureq 3.0 just a couple of months ago. But this might mean I should switch to aws-lc-rs. Hopefully that doesn’t upset too many ureq users :/
There’s been some positive momentum on the GitHub discussion since you posted. Namely the crates.io ownership has been transferred to the rustls people and 2 of them have explicitly said they’ll maintain it. They need to make a new release to reflect the change and then will be able to resolve the advisory.
That does buy some time. It’s the same people stepping up who are writing/maintaining rustls, which makes me happy.
Out of interest, what factors did you consider when choosing between aws-lc-rs and ring?
My dream for ureq is a Rust native library without C underpinnings. The very early releases of rustls made noises I interpreted to be that too, even though they never explicitly stated it being a goal (and ring certainly isn’t Rust native like that). rustls picked ring, and ureq 1.x and 2.x used rust/ring.
As I was working on ureq 3.x, rustls advertised they were switching their default to aws-lc-rs. However the build requirements for aws-lc-rs were terrible – like requiring users on Windows to install nasm (this has since been fixed).
One of ureq’s top priorities has always been to “just work”, especially for users new to Rust. I don’t want new users to face questions about which TLS backend to chose. Hence I stuck with rustls/ring for ureq 3.x.
aws-lc-rs has improved, but it is still the case that ring has a higher chance to compile on more platforms. RISCV is the one I keep hearing about.
Wait does that mean the Rust ecosystem is moving towards relying on Amazon and AWS for its cryptography? That doesn’t sound great. Not that I believe Amazon would add backdoors or anything like that, but I expect them to maintain aws-lac and aws-lc-rs to suit their own needs rather than the needs of the community. It makes me lose some confidence in Rust for these purposes to be honest.
What do you see as the conflict here, i.e. where would the needs differ for crypto libraries?
I’d expect a corporate funded crypto project to be more likely to get paid audits, do compliance work (FIPS etc), and add performance changes for the hardware they use (AWS graviton processors I guess), but none of that seems necessarily bad to me.
Things like maintaining API stability, keeping around ciphers and cryptographic primitives which AWS happens to not need, accepting contributions to add features which AWS doesn’t need or fix bugs that doesn’t affect AWS, and improving performance on platforms which AWS doesn’t use are all things that I wouldn’t trust Amazon for.
Yeah. To AWS advantage is quantum resistant cryptos and FIPS. In a comment below I found there is another initiative “graviola” that seems promising.
There is also the RustCrypto set of projects.
There is, but AIUI AWS-LC uses assembly code and thus can provide timing-safety, whereas RustCrypto is “written in pure Rust”.
The focus on energy consumption misses the point.
People like the OP often assume that using more energy automatically means more pollution and environmental damage, but it’s because they are thinking about coal plants.
What really matters is how we generate that energy, not how much we use. Take countries that rely almost entirely on hydroelectric power, where turbines harness the natural force of falling water, or those powered by nuclear plants. In these cases, if we don’t use the available energy, it simply goes to waste. The energy will be generated whether we use it or not.
So when we’re discussing the ethics of training LLM, their energy consumption it has nothing to do with being ethical or unethical. The nature of the energy sources are a different topic of conversation, regardless on how much energy they use.
Okay, but (say) the OpenAI data centers are located in the US, and we know the energy mix in the US. For example this webpage of the U.S. Energy Information Administration currently states:
And even climate friendly alternatives like hydro require damming up rivers and destroying rivers, windmills require paving new roads into the wilderness and introduce constant noise and visual pollution, solar requires land and mined minerals, nuclear … well ok nuclear is probably fine, but it’s expensive. Not that coal is better than all these, but focus on energy consumption is absolutely warranted as long as there is no single climate and nature-friendly power source that is available everywhere.
(In Norway there has been massive protests against windmills because they destroy so much wild land, and politicians are saying “yes but we must prepare for the AI future, also Teslas, so turning our forests paving-gray is actually the green thing to do also we got some money from those foreign investors which we can make half a kindergarten out of”. And that in a country where we already export much of our electricity and the rest is mostly used for the aluminum industry precisely because power is already fairly cheap here due to having dammed most of our rivers long ago.)
I agree with much of this comment, but I have two quibbles:
One can draw off only a fraction of the water in a river and feed it to a turbine with or without a dam, although of course there are trade-offs.
Nuclear power, at least in practice, implies mining minerals too, specifically radioactive ones (on top of the mining of iron, copper, etc. implied by any electric infrastructure).
In theory, yes. In practice, small hydro has had a much worse impact on the environment than large hydro compared to kwh: https://www-nrk-no.translate.goog/klima/xl/unik-kartlegging_-sma-kraftverk-legger-mer-km-elv-i-ror-enn-store-for-a-lage-like-mye-strom-1.16982097?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=no&_x_tr_pto=wapp
tl;dr actual small hydro plants in Norway have used 5x as much river as large ones per kwh. Though sure in theory it could be better.
Article was rather more tolerable than I was expecting, so good on the author.
I will highlight one particular issue: there’s no way to have both an unbiased/non-problematic electric brain and one that respects users’ rights fully. To wit:
This logic applies just as much to a minority pushing for (say) trans-friendly norms as it does for a minority pushing for incredibly trans-hostile ones. Replace trans-friendly with degrowth or quiverfull or whatever else strikes your fancy.
Like, we’ve managed to create these little electric brains that can help people “think” thoughts that they’re too stupid, ignorant, or lazy to by themselves (I know this, having used various electric brains in each of these capacities). There is no morally coherent way–in my opinion!–of saying “okay, here’s an electric brain just for you that respects your prejudices and freedom of association BUT ALSO will only follow within norms established by our company/society/enlightened intellectual class.”
The only sane thing to do is to let people have whatever electric brains they want with whatever biases they deem tolerable or desirable, make sure they are aware of alternatives and can access them, and then hold them responsible for how they use what their electric brains help them with in meatspace. Otherwise, we’re back to trying to police what people think with their exocortices and that tends to lose every time.
This is just free speech absolutism dressed up in science fiction. There are different consequences to different kinds of speech, and these aren’t even “brains”: they’re databases with a clever compression and indexing strategy. Nobody is or should be required to keep horrendous speech in their database, or to serve it to other people as a service.
Isn’t that exactly the problem @friendlysock is describing? This is already a reality. One has to abide the American Copilot refusing to complete code which mentions anything about sex or gender and the Chinese Deepseek refusing to say anything about what happened at Tianenmen square in 1989.
The problem is powerful tech companies (and the governments under which they fall) imposing their morality and worldview on the user. Same is true for social media companies, BTW. You can easily see how awkward this is with the radically changed position of the large tech companies with the new US administration and the difference in values it represents.
It’s not “free speech absolutism” to want to have your own values represented and expressed in the datasets. At least with more distributed systems like Mastodon you get to choose your moderators. Nobody decries this as “free speech absolutism”. It’s actually the opposite - the deal is that you can join a system which shares your values and you will be protected from hearing things you don’t want to hear. Saying it like this, I’m not so sure this is so great, either… you don’t want everyone retreating into their own siloed echo chambers, that’s a recipe for radicalisation and divisiveness.
Why if you want to write a movie villain?
The problem is not the existence of harmful ideas. The problem is lack of moderation when publishing them.
And yeah, nobody should be required to train models in certain ways. But maybe we should talk about requirements for unchecked outputs? Like when kids ask a chatbot, it shouldn’t try to make them into fascists.
On the other hand, when I ask a chatbot about what’s happening in the US and ask it to compare with e.g. Umberto Eco’s definition of Fascism, it shouldn’t engage in “balanced discussion” just because it’s “political”.
We need authors to have unopiniated tools if we want quality outputs. Imagine your text editor refusing to write certain words.
Ah, I guess? If that bothers you, I think that’s an interesting data point you should reflect on.
Sure, but if somebody chooses to do so, they should be permitted. I’m pointing out that the author complains about bias/problematic models, and also complains about centralization of power. The reasonably effective solution (indeed, the democratic one) is to let everybody have their own models–however flawed–and let the marketplace of ideas sort it out.
In case it needs to be explicitly spelled out: there is no way of solving for bias/problematic models that does not also imply the concentration of power in the hands of the few.
I’m not claiming this is feasible at this point, but is “delete all the models and research and stop all R&D in ML” a counterexample to this claim?
s/encryption/compression/, right? (I use ZFS; it supports both, but I think “compression” was meant.)
Correct. Fixed. Thank you for pointing it out.
Random sidenote: I wish there was standard shortcuts or aliases for frequently typed commands. It’s annoying to type
systemctl daemon-reloadafter editing a unit, e.g. why notsystemctl dr? Or debugging a failed unit,journalctl -xue myunitseems unnecessarily arcane, why not--debugor friendlier?I’m using these:
this is shorter to type, completion still works and I get my less options
Typing this for me looks like sy<tab><tab> d<tab> - doesn’t your shell have systemd completions ?
It does but what you describe doesn’t work for me.
what doesn’t work ? in any modern shell when you are here and type tab twice you will get to daemon-reload. ex: https://streamable.com/jdedh6
your shell doesn’t show up a tab-movable highlight when such prompt appears? If so, try that out. It’s very nice feature.
journalctl -u <service> --followis equally annoyingjournalctl -fu
My favorite command in all linux. Some daemon is not working. F U Mr. Daemon!
so this does exist - I could swear I tried that before and it didn’t work
I wasn’t sure whether to read it as short args or a message directed at journalctl.
Thankfully it can be both! :)
You gotta use -fu not -uf, nothing makes you madder then having to follow some service logs :rage:
That’s standard getopt behaviour.
Well I guess fu rolls better of the tongue than uf. But I remember literally looking up if there isn’t anything like -f and having issues with that. Oh well.
Would it be “too clever” for
systemdto wait for unit files to change and reload the affected system automagically when it changed?I’m not sure it would be “clever”. At best it would make transactional changes (i.e. changes that span several files) hard, at worst impossible. It would also be a weird editing experience when just saving activates the changes.
I wonder why changes should need to be transactional? In Kubernetes we edit resource specs—which are very similar to systemd units—individually. Eventually consistency obviates transactions. I think the same could have held for systemd, right?
Because the services sd manages are mote stateful. If sd restarted every service each moment their on-disk base unit file changes [1], desktop users, database admins, etc would have terrible experience.
[1] say during a routine distro upgrade.
Shorter commands would be easier to type accidentally. I approve of something as powerful as systemctl not being that way.
Does tab completion not work for you, though?
I think most people misunderstand why SQLite on the server is so important:
The problem SQLite solves is providing a more advanced alternative to using something like partitioning keys in Cassandra.
For context:
At scale, companies often use databases that support partitioning keys to ensure that all data related to a single “entity” (e.g., a chat) is stored together for fast, efficient lookups. In Cassandra, a typical chat message store might look like this:
This design ensures that all messages for a given chat are stored sequentially on the same node, enabling fast batch reads. However, it comes with major limitations:
With SQLite, you can sidestep these limitations by creating a dedicated database per
chat_id. This allows for:This approach turns SQLite into an effective solution for entity-local storage while avoiding the complexity of a full-scale distributed database like Cassandra.
+1 SQLite is a nifty storage engine at a good level of abstraction. It’s much higher level (and slower) than KV stores like RocksDB, but conversely vastly simplifies whatever storage architecture you need. I work on a service that builds secondary indexes as SQLite databases on-demand & maintains them via CDC from the Postgres source of truth. The success and expertise from that system has us considering other use cases for Sqlite as well.
That’s fascinating! Just poked around, this sounds a bit different than what you described.
I would love to read more about this if you can convince the powers that be to let you write a technical post!
Cleaned up some of my thoughts and put it in to a blog post: https://lobste.rs/s/gwjvkg/sqlite_on_server_is_misunderstood_better
You’re gonna have to distribute the sqlite files somehow….
I should clarify – there are many flavors of SQLite. e.g. Turso + D1, Litestream, rqlite, and mvsqlite are all very different. This article mentions Litestream specifically, which – you’re correct – does not apply to my comment.
I’m talking specifically about Turso & Cloudflare D1 where the place the database lives is abstracted away from you. This is similar to how partitioning keys in Cassandra are mapped to tokens belonging to machines, which is totally abstracted from you.
Those aren’t “flavors” of sqlite. Those are database systems that happen to use sqlite as a data backend.
“Flavors” as in patterns of how SQLite is used :)
Is it not a problem that “The maximum number of attached databases cannot be increased above 125”?
This is my first post in a series intended to draw programmers from outside the functional programming sphere into dependent typing, by showing off ways that dependent typing can be used to build ergonomic and misuse-resistant APIs in ways that would typically require macros in languages with less powerful type systems.
My target audience is still a bit vauge for this series, currently I’m targeting systems programmers that have decent experience with strongly typed languages, rust comes to mind since I’ve worked extensively in it personally, and has maybe tried haskell and gotten burnt by it.
Feedback is appreciated, I’m interested in hearing about how I can make posts like this more approachable to people with other backgrounds, and I’m also very interested in seeing examples of macros people would like to see disappear into the type system.
How strong is the compile time / run time phase separation in Idris?
Could this technique work with format strings that are chosen at run time, with the type system ensuring that the strings are compatible? (Assuming format strings come from a limited set that is fixed at compile time.)
For Idris 2, that’s a tricky one to answer extensively, I’ve got a whole several posts planned out explaining that. One of 2’s unique features among dependently typed languages, even counting Idris 1, is it’s integration of linear types, which are very similar to the affine types of rust’s ownership, but used for a totally different purpose of controlling the spill over between compile time and run time. Idris 2 has a rather expressive sub language, so to speak, for explicitly encoding the relationship between compile time and runtime.
For the second question, that wouldn’t be difficult, and I’ll probably getting around to covering something like that sooner or later, but basically you just need to write a function that “decides” (produces a proof of equality or a proof of inequality at runtime) whether or not the type signatures produced by two format strings are equal, and then you can pretty easily write up a procedure that reuses
printfFmtto make a function that checks two format strings for compatibility at runtime, and then uses the types from the first to accept arguments for the latter. I may actually end up writing that as an addenda to this post.That linear type machinery sounds interesting! I’ll look out for your future articles, thanks!
I realized that the specific case you were asking about (where the format strings come from a limited set that is fixed at compile time) is actually basically trivial, so I’ve gone ahead and written a little addendum to the post showing that off.
I still think I’ll write another addenda later on that covers the case where the format string is generated at runtime and checked, at runtime, against a template format string for type compatibility, but that’s going to require some slight refactoring to make the implementation short enough to be suitable for tacking onto this blog post.
I like the sound of this, but note that it is disapproved here to have one’s own work be 25% or more of one’s submissions. Try to submit some other things that you find interesting before the next part of this series.
This is their first ever submission. You can’t extrapolate a trend of only posting their own things from a single submission.
This is a common misconception of the rule. You can’t ignore the purpose of the rule and skip straight to the rule of thumb.
The purpose of the rule is:
Writing high-quality educational content like this isn’t automatically “self-promotion” just because it was submitted by the author. It also doesn’t automatically become self-promotion based on ratios. Self-promotion is a type of spam, where spam is defined as frequent low-quality/unwanted content. The rule of thumb is just a way to detect the symptoms of self-promotion but cannot diagnose it alone.
Huh. For what it’s worth, I commented as I did because I figured someone would say it was against the self-promotion rule and I preferred to have the message delivered in what I thought would be a supportive, constructive way by a fellow admirer of dependent typing (me) rather than maybe in a more discouraging way by someone who disdains dependent typing.
This post caught my eye because we’re considering sqlite for lobsters, but this is not really a compelling set of points.
I’m not well-informed enough to agree or disagree with this post, just frustrated that it doesn’t make the best case it could.
Re: migrations: SQLite doesn’t support some commonplace
ALTER TABLEbehavior. Instead, you have to:NOT NULLcolumn without default, a constraint, whateverFor large tables this will lock the database for a while as the data copies over.
Additionally, this is an error-prone process. Do this enough times and eventually you might get the steps wrong. Even if you don’t, I had an ORM delete data on a small project when it executed statements in an unintuitive, undocumented, and nondeterministic order, running the
DROPbefore running the statement disabling foreign keys. The ORM’s built-in schema commands weren’t susceptible to this issue, but SQLite required me to use a lower-level approach.I’m very happy with the solution I built for this problem, which I’ve been using for several years: https://sqlite-utils.datasette.io/en/stable/cli.html#transforming-tables
Just so happens that there’s a “migrations” focused post also on Lobsters today.
To me this does not sound much less nutso than mysql online schema change machinery or vitess or whatever. Postgres is more better about this, sure. But sqlite is not bad, all the DDL is fully supported in transactions and it has a handy slot for storing your schema version number. If you’re using an ORM that is shredding your data, like, don’t use that software, but it’s not SQLite’s fault.
Yeah I agree this post does not make a compelling case. I have my own reasons I don’t love SQLite (really limited builtin methods and a too-flexible grammar) and I work for a Postgres company but from reading this article alone I can’t see a coherent reason not to use SQLite on the server.
Did you miss the part about need to share a disk if you want more than one application server? Granted, not all applications need more than one copy running…
I guess the other option is that each application server has their own sqlite and you use RAFT or something to drive consensus between them… which maybe works if you can have at least 3 of these servers, at the expense of extreme complexity. :)
Surely that’s the important point? SQLite is great when everything fits on a single machine, which is really the case for the vast majority of projects [citation needed].
When something is big enough to need multiple machines then it’s time to switch to a separate storage layer running PostgreSQL.
Isn’t this all just a case of picking the right tool for your situation?
This is the point of LiteFS - it replicates writes to other boxes - no NFS needed (and iirc locks on nfs were unreliable so you probably dont want to run the database file on nfs).
You DO need to take care to send writes to the leader but Fly (where the author works), makes this pretty easy. LiteStream (same author) is similar but is more of a continuous back up tool.
Right. A thing you can do is make your architecture more complicated by introducing LiteFS, or LiteStream, but it’s no longer “just sqlite” at that point.
I’ve been running Litestream for a while now and it’s VERY straight-forward. You start the process and point it at a SQLite database, it inexpensively streams WAL blocks to S3 or similar, you can use those for recovery.
It’s so simple, robust and cheap that I see it as a selling point for SQLite as a whole - much less complex/expensive than achieving point-in-time replica backups with PostgreSQL or MySQL.
Yeah but again, these work with only a single instance at a time. If that’s your setup it’s great, but many people need multi-instance setups and mounting a shared SQLite file on NFS in some cloud provider is not a great idea.
But it isn’t an NFS in the cloud, it replicates the WAL which is then made available, yes via a fuse file system but it’s not NFS, it’s not really any different that read replicas for Postgres or MySQL and it seems to be a lot easier to get going. You DO have HA on your DB don’t you?
If you don’t think it works for you, that’s fine, but I think you should understand how it works before making that decision.
Yeah but the point is that a cloud vendor DB gives me read replicas out of the box, I don’t need to set it up manually. Added to the other points, this makes it kinda hard to argue against PostgreSQL for a typical backend app.
The same is as true with sqlite as with other dbs. With Turso/libsql or litefs you can have a read replica materialized on the local nvme ssd of every application server. I’m not arguing SQLite is a better option for every app like everything in engineering the answer is “it depends”. But it feels unfair to dismiss SQLite based architecture because the sqlite3.c source file doesn’t directly automate some need and dismiss the sqlite vendors, and then compare that to Postgres + unlimited ancillary vendor support & automation.
Turso is one vendor. Cloudflare is another but its SQLite offering has a fairly low max database size. Meanwhile, most cloud providers have mature support for MySQL/Postgres. If the cloud provider situation changes for SQLite in the future and it is offered more widely, then the analysis changes, but until then I believe it’s fair. And so do the folks who actually make SQLite.
Sure… if you’re using AWS, Azure or Google… and yeah, even fly.io (though theirs is NOT managed) does too, but really, it’s run a single executable in the background - replay or routing writes to the master is more work of course, but it’s still pretty easy. And given the stance of the US govt lately towards my country, I am likely NOT to use any of those, preferring to host domestically (sorry to bring politics into this - but everything is a trade off for making technical decisions, including that) .
I once set up HA Postgres on Linode (floating ip,NBD and the watchdog) and it worked, but was cargo culted and seemed very fragile in it’s complexity. LiteFS in comparison seems clean and elegant and would be very robust against SPOF and as a benefit move the db reads to the edge, nearer to my hypothetical users.
But like I said, I am not running anything using it at the moment. The one project I AM considering doing would use Postgres for other reasons.
Right, if you want multi-instance setups you need LiteFS, not Litestream.
To be clear. I’ve not used it but from all appearances, it looks like it adds very little overhead or complexity.
I think there might be a few cases in which SQLite forces one to use multiple queries where PostgreSQL wouldn’t, but it doesn’t usually, does it?
From that linked page:
“the situation at hand” could include “I want to leave open the option of replacing the RDBMS”.
Well, the article is pretty clear in what it’s trying to say, quote:
Which is saying, many choose to use sqlite because they don’t need to run anything extra - they just need to link their code with sqlite and it works. And trying to use sqlite on server-side negates that benefit. So, if that’s not why you want to use sqlite to begin with, this point doesn’t affect you at all.
See also previous discussions of Cosmopolitan on Lobsters
Do you know how they plan to make it sustainable? Moving a bunch of content to something else that will just go away or need to enshittify is not very appealing… it’d be more compelling if they outlined a plan for how they plan to support themselves.
I wondered that myself and saw nothing in briefly looking around, but in December they had a donation campaign progress bar that (to my surprise) at least came close to filling.
I briefly thought this was a thing I read about a few months ago called Weird Gloop which is an effort to rescue wikis from the horrible Fandom farm, but no, Miraheze is much bigger and much older.
I was also surprised to see how long Miraheze has been around! Why aren’t more wikis moving to it from Fandom?
I would imagine the main reason Fandom.com remains more popular than Miraheze (if it does; I don’t know) is that Fandom.com is more visible, but also I think Miraheze expects a significantly higher level of commitment from users: I think Miraheze wikis are required to have their own administrators to enforce Miraheze’s rules, whereas Fandom.com doesn’t seem to care much if a wiki has no admins, and Miraheze threatens to delete wikis that aren’t actively edited, although in practice they seem to grant many exemptions for wikis that are considered “complete” or that pay them, whereas Fandom.com doesn’t seem to care if a wiki is inactive. I don’t mean to object to Miraheze’s policies — I find it understandable that they would have higher standards and ask more of users than Fandom.com does — but I also find it understandable if someone who wants to start a wiki and write up some stuff without taking on much responsibility would prefer Fandom.com.
However, you said “Why aren’t more wikis moving to it from Fandom?”, which maybe implies large, established wikis that would have no problem meeting Miraheze’s requirements, so maybe this is irrelevant to your actual question.
Edit: Also, Fandom.com’s restrictions on discussions of leaving Fandom.com might be effective to some extent.
Miraheze sounds like this typical bad-at-marketing-floss-adjecent product.
The only big non-Wikipedia wiki I was ever part of was gaming-related and (afaik) not known by programmers, so the people just “look elsewhere”, and in this case ownership changed a couple times as well. And I don’t know if Miraheze was around 20y ago…
Hmm… a MediaWiki farm. Donation funded: https://meta.miraheze.org/wiki/FAQ#Business_model
I do hope that when they say “no ads” they aren’t making an exception for Wikipedia-style seasonal solicitation banners.
I understand the displeasure of Jimbo’s face (although it’s been some years since that), or the question if donations are financially necessary (unsure for Wikimedia), but a yearly, dismissable donation banner to me still seems much preferable than constant daily ads.
For sure. If all the fandom / wikia communities would pick up and move here, that would be a better world, and they’re likely to stay afloat with only low-pressure non-intrusive solicitation.
But, for myself, if I’m starting a wiki, I’d rather self-host it. I don’t trust all these experts and their infra.
They have a year-end donation solicitation banner but it’s not obnoxious and arguably deceptive like the Wikimedia Foundation’s.