Hey OP, I like this series and thus would like you to continue being on lobsters. Because of that please review the rules on self-promotion for lobsters.
As a rule of thumb, self-promo should be less than a quarter of one’s stories and comments.
That seems extremely unlikely to me as it would be extremely divisive and expensive for LF, and LF decides what Linux is. Fuschia and other OS’s are much more likely to take market share (Android, ChromeOS, smart devices, etc) than for Linux itself to fundamentally change. Various LF funders are more likely to just sit and wait for those, or to mitigate many of these issues with other projects - ex: it’s way cheaper to build a secure VM than to build a new kernel, so the major LF companies all have their own VMs, mitigating a lot of kernel issues.
I don’t see it happen either right now. But something will survive the current Linux in 50 years. And that is either a Linux which cleaned up all the issues - with or without Rust - or something that surpassed linux because no one bothered to clean up the legacy and people potentially even moved on from projects with millions of lines of C (which again doesn’t have to be Rust). Which in turn seems to be a question of what big corpo invests their money in (Android, VMs + Bare Metal, IoT, ChromeOS) - and what the legislation requires them to achieve in terms of security.
The majority of bugs (quantity, not quality/severity) we have are due to the stupid little corner cases in C that are totally gone in Rust. Things like simple overwrites of memory (not that rust can catch all of these by far), error path cleanups, forgetting to check error values, and use-after-free mistakes. That’s why I’m wanting to see Rust get into the kernel, these types of issues just go away, allowing developers and maintainers more time to focus on the REAL
bugs that happen (i.e. logic issues, race conditions, etc.)
This is an extremely strong statement.
I think a few things are also interesting:
I think people are realizing how low quality the Linux kernel code is, how haphazard development is, how much burnout and misery is involved, etc.
I think people are realizing how insanely not in the open kernel dev is, how much is private conversations that a few are privy to, how much is politics, etc.
I think people are realizing how insanely not in the open kernel dev is, how much is private conversations that a few are privy to, how much is politics, etc.
The Hellwig/Ojeda part of the thread is just frustrating to read because it almost feels like pleading. “We went over this in private” “we discussed this already, why are you bringing it up again?” “Linus said (in private so there’s no record)”, etc., etc.
Dragging discussions out in front of an audience is a pretty decent tactic for dealing with obstinate maintainers. They don’t like to explain their shoddy reasoning in front of people, and would prefer it remain hidden. It isn’t the first tool in the toolbelt but at a certain point there is no convincing people directly.
Dragging discussions out in front of an audience is a pretty decent tactic for dealing with
With quite a few things actually. A friend of mine is contributing to a non-profit, which until recently had this very toxic member (they’ve even attempted felony). They were driven out of the non-profit very soon after members talked in a thread that was accessible to all members. Obscurity is often one key component of abuse, be it mere stubbornness or criminal behaviour. Shine light, and it often goes away.
IIRC Hintjens noted this quite explicitly as a tactic of bad actors in his works.
It’s amazing how quickly people are to recognize folks trying to subvert an org piecemeal via one-off private conversations once everybody can compare notes. It’s equally amazing to see how much the same people beforehand will swear up and down oh no that’s a conspiracy theory such things can’t happen here until they’ve been burned at least once.
This is an active, unpatched attack vector in most communities.
I’ve found the lowest example of this is even meetings minutes at work. I’ve observed that people tend to act more collaboratively and seek the common good if there are public minutes, as opposed to trying to “privately” win people over to their desires.
I think people are realizing how low quality the Linux kernel code is, how haphazard development is, how much burnout and misery is involved, etc.
Something I’ve noticed is true in virtually everything I’ve looked deeply at is the majority of work is poor to mediocre and most people are not especially great at their jobs. So it wouldn’t surprise me if Linux is the same. (…and also wouldn’t surprise me if the wonderful Rust rewrite also ends up poor to mediocre.)
yet at the same time, another thing that astonishes me is how much stuff actually does get done and how well things manage to work anyway. And Linux also does a lot and works pretty well. Mediocre over the years can end up pretty good.
After tangentially following the kernel news, I think a lot of churning and death spiraling is happening. I would much rather have a rust-first kernel that isn’t crippled by the old guard of C developers reluctant to adopt new tech.
Take all of this energy into RedoxOS and let Linux stay in antiquity.
I’ve seen some of the R4L people talk on Mastodon, and they all seem to hate this argument.
They want to contribute to Linux because they use it, want to use it, and want to improve the lives of everyone who uses it. The fact that it’s out there and deployed and not a toy is a huge part of the reason why they want to improve it.
Hopping off into their own little projects which may or may not be useful to someone in 5-10 years’ time is not interesting to them. If it was, they’d already be working on Redox.
The most effective thing that could happen is for the Linux foundation, and Linus himself, to formally endorse and run a Rust-based kernel. They can adopt an existing one or make a concerted effort to replace large chunks of Linux’s C with Rust.
IMO the Linux project needs to figure out something pretty quickly because it seems to be bleeding maintainers and Linus isn’t getting any younger.
They may be misunderstanding the idea that others are not necessarily incentivized to do things just because it’s interesting for them (the Mastodon posters).
Redox does have the chains of trying to do new OS things. An ABI-compatible Rust rewrite of the Linux kernel might get further along than expected (even if it only runs in virtual contexts, without hardware support (that would come later.))
Linux developers want to work on Linux, they don’t want to make a new OS. Linux is incredibly important, and companies already have Rust-only drivers for their hardware.
Basically, sure, a new OS project would be neat, but it’s really just completely off topic in the sense that it’s not a solution for Rust for Linux. Because the “Linux” part in that matters.
I read a 25+ year old article [1] from a former Netscape developer that I think applies in part
The idea that new code is better than old is patently absurd. Old code has been used. It has been tested. Lots of bugs have been found, and they’ve been fixed. There’s nothing wrong with it. It doesn’t acquire bugs just by sitting around on your hard drive. Au contraire, baby! Is software supposed to be like an old Dodge Dart, that rusts just sitting in the garage? Is software like a teddy bear that’s kind of gross if it’s not made out of all new material?
Adopting a “rust-first” kernel is throwing the baby out with the bathwater. Linux has been beaten into submission for over 30 years for a reason. It’s the largest collaborative project in human history and over 30 million lines of code. Throwing it out and starting new would be an absolutely herculean effort that would likely take years, if it ever got off the ground.
The idea that old code is better than new code is patently absurd. Old code has stagnated. It was built using substandard, out of date methodologies. No one remembers what’s a bug and what’s a feature, and everyone is too scared to fix anything because of it. It doesn’t acquire new bugs because no one is willing to work on that weird ass bespoke shit you did with your C preprocessor. Au contraire, baby! Is software supposed to never learn? Are we never to adopt new tools? Can we never look at something we’ve built in an old way and wonder if new methodologies would produce something better?
This is what it looks like to say nothing, to beg the question. Numerous empirical claims, where is the justification?
It’s also self defeating on its face. I take an old codebase, I fix a bug, the codebase is now new. Which one is better?
Like most things in life the truth is somewhere in the middle. There is a reason there is the concept of a “mature node” in the semiconductor industry. They accept that new is needed for each node, but also that the new thing takes time to iron out the kinks and bugs. This is the primary reason why you see apple take new nodes on first before Nvidia for example, as Nvidia require much larger die sizes, and so less defects per square mm.
You can see this sometimes in software for example X11 vs Wayland, where adoption is slow, but most definetly progressing and now-days most people can see that Wayland is now, or is going to become the dominant tech in the space.
I don’t think this would qualify as dialectic, it lacks any internal debate and it leans heavily on appeals by analogy and intuition/ emotion. The post itself makes a ton of empirical claims without justification even beyond the quoted bit.
That means we can probably keep a lot of the old trusty Linux code around while making more of the new code safe by writing it in Rust in the first place.
I don’t think that’s a fair assessment of Spolsky’s argument or of CursedSilicon’s application of it to the Linux kernel.
Firstly, someone has already pointed out the research that suggests that existing code has fewer bugs in than new code (and that the older code is, the less likely it is to be buggy).
Secondly, this discussion is mainly around entire codebases, not just existing code. Codebases usually have an entire infrastructure around them for verifying that the behaviour of the codebase has not changed. This is often made up of tests, but it’s also made up of the users who try out a release of a codebase and determine whether it’s working for them. The difference between making a change to an existing codebase and releasing a new project largely comes down to whether this verification (both in terms of automated tests and in terms of users’ ability to use the new release) works for the new code.
Given this difference, if I want to (say) write a new OS completely in Rust, I need to choose: Do I want to make it completely compatible with Linux, and therefore take on the significant challenge of making sure everything behaves truly the same? Or do I make significant breaking changes, write my own OS, and therefore force potential adopters to rebuild their entire Linux workflows in my new OS?
The point is not that either of these options are bad, it is that they represent significant risks to a project. Added to the general risk that is writing new code, this produces a total level of risk that might be considered the baseline risk of doing a rewrite. Now risk is not bad per se! If the benefits of being able to write an OS in a language like Rust outweigh the potential risks, then it still makes sense to perform the rewrite. Or maybe the existing Linux kernel is so difficult to maintain that a new codebase really would be the better option. But the point that CursedSilicon was making by linking the Spolsky piece was, I believe, that the risks for a project like the Linux kernel are very high. There is a lot of existing, old code. And there is a very large ecosystem where either breaking or maintaining compatibility would each come with significant challenges.
Unfortunately, it’s very difficult to measure the risks and benefits here in a quantitative, comparable way, so I think where you fall on the “rewrite vs continuity” spectrum will depend mostly on what sort of examples you’ve seen, and how close you think this case is to those examples. I don’t think there’s any objective way to say whether it makes more sense to have something like R4L, or something like RedoxOS.
Firstly, someone has already pointed out the research that suggests that existing code has fewer bugs in than new code (and that the older code is, the less likely it is to be buggy).
I haven’t read it yet, but I haven’t made an argument about that, I just created a parody of the argument as presented. I’ll be candid, i doubt that the research is going to compel me to believe that newer code is inherently buggier, it may compel me to confirm my existing belief that testing software in the field is one good method to find some classes of bugs.
Secondly, this discussion is mainly around entire codebases, not just existing code.
I guess so, it’s a bit dependent on where we say the discussion starts - three things are relevant; RFL, which is not a wholesale rewrite, a wholesale rewrite of the Linux kernel, and Netscape. RFL is not about replacing the entire Linux kernel, although perhaps “codebase” here refers to some sort of unit, like a driver. Netscape wanted a wholesale rewrite, based on the linked post, so perhaps that’s what’s really “the single worst strategic mistake that any software company can make”, but I wonder what the boundary here is? Also, the article immediately mentions that Microsoft tried to do this with Word but it failed, but that Word didn’t suffer from this because it was still actively developed - I wonder if it really “failed” just because pyramid didn’t become the new Word? Did Microsoft have some lessons learned, or incorporate some of that code? Dunno.
I think I’m really entirely justified when I say that the post is entirely emotional/ intuitive appeals, rhetoric, and that it makes empirical claims without justification.
There’s a subtle reason that programmers always want to throw away the code and start over. The reason is that they think the old code is a mess. And here is the interesting observation: they are probably wrong. The reason that they think the old code is a mess is because of a cardinal, fundamental law of programming:
This is rhetoric. These are unsubstantiated empirical claims. The article is all of this. It’s fine as an interesting, thought provoking read that gets to the root of our intuitions, but I think anyone can dismiss it pretty easily since it doesn’t really provide much in the form of an argument.
It’s important to remember that when you start from scratch there is absolutely no reason to believe that you are going to do a better job than you did the first time.
Again, totally unsubstantiated. I have MANY reasons to believe that, it is simply question begging to say otherwise.
That’s all this post is. Over and over again making empirical claims with no evidence and question beggign.
We can discuss the risks and benefits, I’d advocate for that. This article posted doesn’t advocate for that. It’s rhetoric.
existing code has fewer bugs in than new code (and that the older code is, the less likely it is to be buggy).
This is a truism. It is survival bias. If the code was buggy, it would eventually be found and fixed. So all things being equal newer code is riskier than old code. But it’s also been impirically shown that using Rust for new code is not “all things being equal”. Google showed that new code in Rust is as reliable as old code in C. Which is good news: you can use old C code from new Rust projects without the risk that comes from new C code.
But it’s also been impirically shown that using Rust for new code is not “all things being equal”.
Yeah, this is what I’ve been saying (not sure if you’d meant to respond to me or the parent, since we agree) - the issue isn’t “new” vs “old” it’s things like “reviewed vs unreviewed” or “released vs unreleased” or “tested well vs not tested well” or “class of bugs is trivial to express vs class of bugs is difficult to express” etc.
I don’t disagree that the rewards can outweigh the risks, and in this case I think there’s a lot of evidence that suggests that memory safety as a default is really important for all sorts of reasons. Let alone the many other PL developments that make Rust a much more suitable language to develop in than C.
It’s a Ship of Theseus—at no point can you call it a “new” codebase, but after a period of time, it could be completely different code. I have a C program I’ve been using and modifying for 25 years. At any given point, it would have been hard to say “this is now a new codebase,
yet not one line of code in the project is the same as when I started (even though it does the same thing at it always has).
I don’t see the point in your question. It’s going to depend on the codebase, and on the nature of the changes; it’s going to be nuanced, and subjective at least to some degree. But the fact that it’s prone to subjectivity doesn’t mean that you get to call an old codebase with a single fixed bug a new codebase, without some heavy qualification which was lacking.
What’s old and new is poorly defined and yet there’s an argument being made that “old” and “new” are good indicators of something. If they’re so poorly defined that we have to bring in all sorts of additional context like the nature of the changes, not just when they happened or the number of lines changed, etc, then it seems to me that we would be just as well served to throw away the “old” and “new” and focus on that context.
I feel like enough people would agree more-or-less on what was an “old” or “new” codebase (i.e. they would agree given particular context) that they remain useful terms in a discussion. The general context used here is apparent (at least to me) given by the discussion so far: an older codebase has been around for a while, has been maintained, has had kinks ironed out.
There’s a really important distinction here though. The point is to argue that new projects will be less stable than old ones, but you’re intuitively (and correctly) bringing in far more important context - maintenance, testing, battle testing, etc. If a new implementation has a higher degree of those properties then it being “new” stops being relevant.
It’s also self defeating on its face. I take an old codebase, I fix a bug, the codebase is now new. Which one is better?
My point was that this statement requires a definition of “new codebase” that nobody would agree with, at least in the context of the discussion we’re in. Maybe you are attacking the base proposition without applying the surrounding context, which might be valid if this were a formal argument and not a free-for-all discussion.
If a new implementation has a higher degree of those properties
I think that it would be considered no longer new if it had had significant battle-testing, for example.
FWIW the important thing in my view is that every new codebase is a potential old codebase (given time and care), and a rewrite necessarily involves a step backwards. The question should probably not be, which is immediately better?, but, which is better in the longer term (and by how much)? However your point that “new codebase” is not automatically worse is certainly valid. There are other factors than age and “time in the field” that determine quality.
Methodologies don’t matter for quality of code. They could be useful for estimates, cost control, figuring out whom you shall fire etc. But not for the quality of code.
I’ve never observed a programmer become better or worse by switching methodology. Dijkstra would’ve not became better if you made him do daily standups or go through code reviews.
There are ways to improve your programming by choosing different approach but these are very individual. Methodology is mostly a beancounting tool.
When I say “methodology” I’m speaking very broadly - simply “the approach one takes”. This isn’t necessarily saying that any methodology is better than any other. The way I approach a task today is better, I think, then the way that I would have approached that task a decade ago - my methodology has changed, the way I think has changed. Perhaps that might mean I write more tests, or I test earlier, but it may mean exactly the opposite, and my methods may only work best for me.
I’m not advocating for “process” or ubiquity, only that the approach one tasks may improve over time, which I suspect we would agree on.
It’s the largest collaborative project in human history and over 30 million lines of code.
How many of those lines are part of the core? My understanding was that the overwhelming majority was driver code. There may not be that much core subsystem code to rewrite.
For a previous project, we included a minimal Linux build. It was around 300 KLoC, which included networking and the storage stack, along with virtio drivers.
That’s around the size a single person could manage and quite easy with a motivated team.
If you started with DPDK and SPDK then you’d already have filesystems and a copy of the FreeBSD network stack to run in isolated environments.
Once many drivers share common rust wrappers over core subsystems, you could flip it and write the subsystem in Rust. Then expose C interface for the rest.
I see that Drew proposes a new OS in that linked article, but I think a better proposal in the same vein is a fork. You get to keep Linux, but you can start porting logic to Rust unimpeded, and it’s a manageable amount of work to keep porting upstream changes.
Remember when libav forked from ffmpeg? Michael Niedermayer single-handedly ported every single libav commit back into ffmpeg, and eventually, ffmpeg won.
At first there will be extremely high C percentage, low Rust percentage, so porting is trivial, just git merge and there will be no conflicts. As the fork ports more and more C code to Rust, however, you start to have to do porting work by inspecting the C code and determining whether the fixes apply to the corresponding Rust code. However, at that point, it means you should start seeing productivity gains, community gains, and feature gains from using a better language than C. At this point the community growth should be able to keep up with the extra porting work required. And this is when distros will start sniffing around, at first offering variants of the distro that uses the forked kernel, and if they like what they taste, they might even drop the original.
I genuinely think it’s a strong idea, given the momentum and potential amount of labor Rust community has at its disposal.
I think the competition would be great, especially in the domain of making it more contributor friendly to improve the kernel(s) that we use daily.
I certainly don’t think this is impossible, for sure. But the point ultimately still stands: Linux kernel devs don’t want a fork. They want Linux. These folks aren’t interested in competing, they’re interested in making the project they work on better. We’ll see if some others choose the fork route, but it’s still ultimately not the point of this project.
Linux developers want to work on Linux, they don’t want to make a new OS.
While I don’t personally want to make a new OS, I’m not sure I actually want to work on Linux. Most of the time I strive for portability, and so abstract myself from the OS whenever I can get away with it. And when I can’t, I have to say Linux’s API isn’t always that great, compared to what the BSDs have to offer (epoll vs kqueue comes to mind). Most annoying though is the lack of documentation for the less used APIs: I’ve recently worked with Netlink sockets, and for the proc stuff so far the best documentation I found was the freaking source code of a third party monitoring program.
I was shocked. Complete documentation of the public API is the minimum bar for a project as serious of the Linux kernel. I can live with an API I don’t like, but lack of documentation is a deal breaker.
While I don’t personally want to make a new OS, I’m not sure I actually want to work on Linux.
I think they mean that Linux kernel devs want to work on the Linux kernel. Most (all?) R4L devs are long time Linux kernel devs. Though, maybe some of the people resigning over LKML toxicity will go work on Redox or something…
Re-Implementing the kernel ABI would be a ton of work for little gain if all they wanted was to upstream all the work on new hardware drivers that is already done - and then eventually start re-implementing bits that need to be revised anyway.
If the singular required Rust toolchain didn’t feel like such a ridiculous to bootstrap 500 ton LLVM clown car I would agree with this statement without reservation.
Zig is easier to implement (and I personally like it as a language) but doesn’t have the same safety guarantees and strong type system that Rust does. It’s a give and take. I actually really like Rust and would like to see a proliferation of toolchain options, such as what’s in progress in GCC land. Overall, it would just be really nice to have an easily bootstrapped toolchain that a normal person can compile from scratch locally, although I don’t think it necessarily needs to be the default, or that using LLVM generally is an issue. However, it might be possible that no matter how you architect it, Rust might just be complicated enough that any sufficiently useful toolchain for the language could just end up being a 500 ton clown car of some kind anyways.
Depends on which parts of GP’s statement you care about: LLVM or bootstrap. Zig is still depending on LLVM (for now), but it is no longer bootstrappable in a limited number of steps (because they switched from a bootstrap C++ implementation of the compiler to keeping a compressed WASM build of the compiler as a blob.
Yep, although I would also add it’s unfair to judge Zig in any case on this matter now given it’s such a young project that clearly is going to evolve a lot before the dust begins to settle (Rust is also young, but not nearly as young as Zig). In ten to twenty years, so long as we’re all still typing away on our keyboards, we might have a dozen Zig 1.0 and a half dozen Zig 2.0 implementations!
Yeah, the absurdly low code quality and toxic environment make me think that Linux is ripe for disruption. Not like anyone can produce a production kernel overnight, but maybe a few years of sustained work might see a functional, production-ready Rust kernel for some niche applications and from there it could be expanded gradually. While it would have a lot of catching up to do with respect to Linux, I would expect it to mature much faster because of Rust, because of a lack of cruft/backwards-compatibility promises, and most importantly because it could avoid the pointless drama and toxicity that burn people out and prevent people from contributing in the first place.
From the thread in OP, if you expand the messages, there is wide agreement among the maintainers that all sorts of really badly designed and almost impossible to use (safely) APIs ended up in the kernel over the years because the developers were inexperienced and kind of learning kernel development as they went. In retrospect they would have designed many of the APIs very differently.
It’s based on my forays into the Linux kernel source code. I don’t doubt there’s some quality code lurking around somewhere, but the stuff I’ve come across (largely filesystem and filesystem adjacent) is baffling.
Seeing how many people are confidently incorrect about Linux maintainers only caring about their job security and keeping code bad to make it a barrier to entry, if nothing else taught me how online discussions are a huge game of Chinese whispers where most participants don’t have a clue of what they are talking about.
I doubt that maintainers are “only caring about their job security and keeping back code” but with all due respect: You’re also just taking arguments out of thin air right now. What I do believe is what we have seen: Pretty toxic responses from some people and a whole lot of issues trying to move forward.
Seeing how many people are confidently incorrect about Linux maintainers only caring about their job security and keeping code bad to make it a barrier to entry
Huh, I’m not seeing any claim to this end from the GP, or did I not look hard enough? At face value, saying that something has an “absurdly low code quality” does not imply anything about nefarious motives.
Still, in GP’s case the Chinese whispers have reduced “the safety of this API is hard to formalize and you pretty much have to use it the way everybody does it” to “absurdly low quality”. To which I ask, what is more likely. 1) That 30-million lines of code contain various levels of technical debt of which maintainers are aware; and that said maintainers are worried even of code where the technical debt is real but not causing substantial issue in practice? Or 2) that a piece of software gets to run on literally billions of devices of all sizes and prices just because it’s free and in spite of its “absurdly low quality”?
Linux is not perfect, neither technically nor socially. But it sure takes a lot of entitlement and self-righteousness to declare it “of absurdly low quality” with a straight face.
GP here: I probably should have said “shockingly” rather than “absurdly”. I didn’t really expect to get lawyered over that one word, but yeah, the idea was that for a software that runs on billions of devices, the code quality is shockingly low.
Of course, this is plainly subjective. If your code quality standards are a lot lower than mine then you might disagree with my assessment.
That said, I suspect adoption is a poor proxy for code quality. Internet Explorer was widely adopted and yet it’s broadly understood to have been poorly written.
But it sure takes a lot of entitlement and self-righteousness to declare it “of absurdly low quality” with a straight face
I’m sure self-righteousness could get you to the same place, but in my case I arrived by way of experience. You can relax, I wasn’t attacking Linux—I like Linux—it just has a lot of opportunity for improvement.
I guess I’ve seen the internals of too much proprietary software now to be shocked by anything about Linux per se. I might even argue that the quality of Linux is surprisingly good, considering its origins and development model.
I think I’d lawyer you a tiny bit differently: some of the bugs in the kernel shock me when I consider how many devices run that code and fulfill their purposes despite those bugs.
FWIW, I was not making a dig at open source software, and yes plenty of corporate software is worse. I guess my expectations for Linux are higher because of how often it is touted as exemplary in some form or another. I don’t even dislike Linux, I think it’s the best thing out there for a huge swath of use cases—I just see some pretty big opportunities for improvement.
But it sure takes a lot of entitlement and self-righteousness to declare it “of absurdly low quality” with a straight face.
Or actual benchmarks: the performance the Linux kernel leaves on the table in some cases is absurd. And sure it’s just one example, but I wouldn’t be surprised if it was representative of a good portion of the kernel.
Well not quite but still “considered broken beyond repair by many people related to life time management” - which is definitely worse than “hard to formalize” when “the way ever[y]body does it” seems to vary between each user.
I love Rust but still, we’re talking of a language which (for good reasons!) considers doubly linked lists unsafe. Take an API that gets a 4 on Rusty Russell’s API design scale (“Follow common convention and you’ll get it right”), but which was designed for a completely different programming language if not paradigm, and it’s not surprising that it can’t easily be transformed into a 9 (“The compiler/linker won’t let you get it wrong”). But at the same time there are a dozen ways in which, according to the same scale, things could actually be worse!
What I dislike is that people are seeing “awareness of complexity” and the message they spread is “absurdly low quality”.
Note that doubly linked lists are not a special case at all in Rust. All the other common data structures like Vec, HashMap etc. also need unsafe code in their implementation.
Implementing these datastructures in Rust, and writing unsafe code in general, is indeed roughly a 4. But these are all already implemented in the standard library, with an API that actually is at a 9. And std::collections::LinkedList is constructive proof that you can have a safe Rust abstraction for doubly linked lists.
Yes, the implementation could have bugs, thus making the abstraction leaky. But that’s the case for literally everything, down to the hardware that your code runs on.
You’re absolutely right that you can build abstractions with enough effort.
My point is that if a doubly linked list is (again, for good reasons) hard to make into a 9, a 20-year-old API may very well be even harder. In fact, std::collections::LinkedList is safe but still not great (for example the cursor API is still unstable); and being in std, it was designed/reviewed by some of the most knowledgeable Rust developers, sort of by definition. That’s the conundrum that maintainers face and, if they realize that, it’s a good thing. I would be scared if maintainers handwaved that away.
Yes, the implementation could have bugs, thus making the abstraction leaky.
Bugs happen, but if the abstraction is downright wrong then that’s something I wouldn’t underestimate. A lot of the appeal of Rust in Linux lies exactly in documenting/formalizing these unwritten rules, and wrong documentation can be worse than no documentation (cue the negative parts of the API design scale!); even more so if your documentation is a formal model like a set of Rust types and functions.
That said, the same thing can happen in a Rust-first kernel, which will also have a lot of unsafe code. And it would be much harder to fix it in a Rust-first kernel, than in Linux at a time when it’s just feeling the waters.
In fact, std::collections::LinkedList is safe but still not great (for example the cursor API is still unstable); and being in std, it was designed/reviewed by some of the most knowledgeable Rust developers, sort of by definition.
At the same time, it was included almost as like, half a joke, and nobody uses it, so there’s not a lot of pressure to actually finish off the cursor API.
It’s also not the kind of linked list the kernel would use, as they’d want an intrusive one.
And yet, safe to use doubly linked lists written in Rust exist. That the implementation needs unsafe is not a real problem. That’s how we should look at wrapping C code in safe Rust abstractions.
The whole comment you replied to, after the one sentence about linked lists, is about abstractions. And abstractions are rarely going to be easy, and sometimes could be hardly possible.
That’s just a fact. Confusing this fact for something as hyperbolic as “absurdly low quality” is stunning example of the Dunning Kruger effect, and frankly insulting as well.
I personally would call Linux low quality because many parts of it are buggy as sin. My GPU stops working properly literally every other time I upgrade Linux.
No one is saying that Linux is low quality because it’s hard or impossible to abstract some subsystems in Rust, they’re saying it’s low quality because a lot of it barely works! I would say that your “Chinese whispers” misrepresents the situation and what people here are actually saying. “the safety of this API is hard to formalize and you pretty much have to use it the way everybody does it” doesn’t apply if no one can tell you how to use an API, and everyone does it differently.
Actually, the NT kernel of all things seems to have a pretty good reputation, and I wouldn’t dismiss the BSD kernels out of hand. I don’t know which kernel is better, but it seems you do. If you could explain how you came to this conclusion that would be most helpful.
*nod* I haven’t been a Windows person since shortly after the release of Windows XP (i.e. the first online activation DRM’d Windows) but, whenever I see glimpses of what’s going on inside the NT kernel in places like Project Zero: The Definitive Guide on Win32 to NT Path Conversion, it really makes me want to know more.
I found the first reply on LKML to be very interesting.
To quote:
for lots of the in-kernel APIs,
compile-time constraints enforcement to prevent misuse doesn’t matter,
because those APIs don’t provide any way to be used safely. Looking at
the two subsystems I know the best, V4L2 and DRM, handling the life time
of objects safely in drivers isn’t just hard, it’s often plain
impossible
And
in order to provide API that are possible to use correctly,
we have many areas deep in kernel code that will require a complete
redesign
[..]
I would be very surprised if I was working in the only area in the
kernel that is considered broken beyond repair by many people related to
life time management
Which feels to me like there is a strong chicken and egg problem: To actually add any rust bindings for certain kernel parts you would need to first rewrite them, because there is apparently no actual defined way to call them safely.
Which means it’s not about adding rust, it’s about rust being the reason to poke where it hurts. Potentially requiring a rewrite of hundreds of thousands LOC to even start seeing any benefits. In a state where I wouldn’t blame any maintainer that told me they don’t actually know how that part of the code truly works.
Yeah. Part of the drama has been the R4L folks trying to get subsystem maintainers in these areas to document the “right ways” to use the APIs so the Rust API can incorporate those rules, and some maintainers saying “just do it like that other filesystem and stop harassing us, you said you’d do all the work”. (At least that’s how they’re perceived.) But it’s not like they would let the R4L folks go in and rewrite that stuff, either.
I recall Asahi Lina’s comments on drm_sched. Choice quotes:
But the scheduler also keeps track of jobs, which reference their completion fences, so we have a lifetime loop. That loop is broken at certain points in the job lifecycle, but the fact it exists makes it very difficult to reason about the lifetimes of any of this stuff, and also makes it impossible to implement the requirements imposed by drm_sched via straight refcounting. If you try to refcount the scheduler and have the hw fence hold a reference to it, then the whole thing deadlocks, because the job completion fence might have its final reference dropped by the scheduler itself (when a job is cleaned up after completion), which would lead to trying to free the scheduler from the scheduler workqueue itself.
So now your driver needs to implement some kind of deferred cleanup workqueue to free schedulers possibly forever in the future. And also your driver module might be blocked from unloading from the kernel forever, because if any buffers hold on to job completion fences, that means your driver can’t unload due to the dependency.
I fixed it so that tearing down the scheduler gracefully aborts all jobs and detaches the hardware callbacks (it can’t abort the underlying hardware jobs, but it can decouple them from the scheduler side). In my driver’s case, that all works beautifully because my driver internals are basically reference counted everywhere, so while the scheduler and high-level queue can be destroyed, any currently running jobs continue to run to completion or failure and their underlying driver resources get cleaned up then, asynchronously.
The maintainer rejected the patch, and said it was the driver’s job to ensure that the scheduler outlives job execution.
But the scheduler owns the jobs lifetime-wise after you submit them, so how would that work? It doesn’t. If you try to introduce a job->scheduler reference, you’re creating a loop again, and the scheduler deadlocks when it frees a job and tries to tear itself down from within.
So now we’re back at having to introduce an asynchronous cleanup workqueue or similar, just to deal with the DRM scheduler’s incredibly poor lifetime design choices.
If I remember correctly, most C drivers that use drm_sched do not get this right, but it doesn’t come up much because most people aren’t trying to shut down their GPUs other than when they’re shutting off their computers, unless they’re using an eGPU (and eGPUs are notoriously semi-broken on Linux). Lina’s M1 GPU driver uses a scheduler per GPU context (/per application), hence schedulers are torn down whenever graphical applications are closed, so her driver couldn’t just ignore the complexity like most other drivers appear to do.
Those statements just come across to me as “we built something unmaintainable and now I don’t want to maintain it”, i.e., a way to avoid doing the hard work.
So we’ll have these bindings creep everywhere like a cancer and are
very quickly moving from a software project that allows for and strives
for global changes that improve the overall project to increasing
compartmentalization [2].
Because the cancer metaphor worked so well for Hellwig the last time he used it…
I think you’re underestimating how many years it would take to replace some of this code, let alone verify it actually works on the real hardware without random crashes (as we’ve seen in other reports about new CPU architectures playing Heisenbug). Sure you would want to do that eventually - but I don’t want to be the one telling everyone I’m gonna freeze features until this is done, with potentially more bugs when it’s finished.
It’s one thing to say “I don’t have the time to fix this”, it’s another to reject a proposed fix (see drm_sched comment above) or to prevent other people from working on fixes elsewhere in the tree (Hellwig). You don’t have to freeze you feature work when other people are working on fixes and refactorings.
How long are you willing to wait for an updated Linux kernel? It may not be “we are unwilling to do maintenance” and more “this is a lot of major work where intermediate steps might not be usable.”
If they’re a paid maintainer, then it’s their job to do just that.
For my personal projects, I can “pay myself” to address technical debt. And I have, because I’m the only user of my code and thus, I have final say in what and how it works. At my previous job, any attempt to address technical debt of the project (that I had been working on for over a decade, pretty much from the start of it) would have been shut down immediately as being too risky, despite the 17,000+ tests [1].
Where do the incentives come in to address technical debt in the Linux kernel? Is that a better way to ask the question?
[1] Thanks to new management. At one point, my new manager revoked the code I rewrote to address some minor technical debt to the original code plus the minimum to get it working, because the rewrite was deemed “too risky”.
At my previous job, any attempt to address technical debt of the project (that I had been working on for over a decade, pretty much from the start of it) would have been shut down immediately as being too risky, despite the 17,000+ tests
But go explain to your boss who just saw a working prototype, that you need a couple more days to design an alternate implementation, that may or may not be included in the final product. That you still need a couple more automated tests just to make sure. That you’ll take this slow approach now and forever, pinkie promise that’s how we’ll ship sooner.
[…] So unless I have cause to believe my boss understands those things, I just decide such menial considerations are below their pay grade, and tell them I am not done yet.
But if I can’t even get around the morons up top, I’m out pretty quick, one way or another.
The kernel has happily managed major API rewrites before, either merging the changes bit by bit or maintaining both versions in tree until the old one is ripe for deletion. And thru the magic of git and community effort, none of that has to delay the release of new kernels.
I’m trying to see what you were getting at. What did you mean by “just that”?
Doing the design work to create safe-to-use APIs with lifetimes considered is part of the work of the maintainer in my view because they should have the best perspective to do so. They got it into that state, they can get it out of that state. Whining that it’s hard work shouldn’t be acceptable as a reason to not do the work.
Doing the design work to create safe-to-use APIs with lifetimes considered is part of the work of the maintainer in my view because they should have the best perspective to do so. They got it into that state, they can get it out of that state.
I’m not aware of any precedent for something like this so maybe there’s a way in which you’re right. But there seems to be a contradiction on whether you think we should defer to their judgement.
Whining that it’s hard work shouldn’t be acceptable as a reason to not do the work.
I don’t agree with that. I accept the RfL side’s refusal to build and test their own OS or Linux fork, for example.
I think their judgment is separate from their capability. I don’t think any of these maintainers are fundamentally incompetent people. I’m not sure if they need mentorship on building APIs with regards to lifetimes because they should be aware memory has lifetimes everywhere, implicitly, in their code already.
because if you trust them to define “lifetimes,” doesn’t that mean you trust them to estimate the amount of time before such a point when the costs of changing the API outweigh the benefits? yet you don’t trust their estimation of the costs imposed by the practice and the amount of extra work it would take before it yields benefits?
Which means it’s not about adding rust, it’s about rust being the reason to poke where it hurts.
Good point! This is something I noticed in a previous job as well, where we introduced computer assistance to existing manual workflows. Apparently the real reason for resistance from the workers was that in the course of this computerization, their “traditional” workflows would be documented and maybe even evaluated before they could be encoded in a computer program. But IIRC this reason was never said out loud by anyone – some developers realized this reason on their own and adjusted their approach, but some didn’t realize this and wondered about the constant pushback.
And maybe to the managers of those workers the computerization was not even the real goal, but the real improvement was supposed to come from the “inventorization” of existing workflows. In a similar way, while the Rust devs want Rust to enter the kernel, maybe some progressive Linux devs sees Rust “just” as a vehicle to make Linux internals more strict and more understandable, and introducing Rust is maybe just a happy side effect of this.
maybe some progressive Linux devs sees Rust “just” as a vehicle to make Linux internals more strict and more understandable, and introducing Rust is maybe just a happy side effect of this.
This is the entire situation from the start. Like this is what Rust for Linux is.
Absolutely, and also people do not recognize how unforgiving Rust is to even normal APIs that to you would use in C (insert joke on linked lists), and the constraints that the word “safely” means when applied to Rust.
Not being able to devise a “safe” Rust abstraction doesn’t mean that the API must be a source of insecurity. Certainly it isn’t a great start, I will grant that, but generally in C you will find that most code ends up doing the same thing that works. The maintainers however recognize that this is not the way to introduce a semi formal definition of how the API operates, and are worried that it may not be possible at all. This is being aware of the environment and the complexity that comes from 20-30 years of development in C; it’s not wanting to “avoid doing the hard work”.
The maintainer rejected the fix because (according to him) the API was not poorly designed, but simply not supposed to be used like that (literal quote: “this functionality here isn’t made for your use case”). Which makes sense and is consistent with what I wrote above: the maintainer is conscious of the limits of C and does not want the API to be used in ways that were not anticipated, whereas the Rust developer is more confident because of the more powerful compile-time checks.
Not knowing the code I cannot understand the tradeoffs involved in the fix. I can’t say whether the maintainer was too cautious, and obviously the failure mode (use after free) is anything but great. My point is that, as you look more in depth, you can see that people actually do put thought in their decisions, but clashes can and will happen if they evaluate the tradeoffs differently.
As an aside: drm_sched is utility code, not a core part of the graphics stack, so for now the solution to Lina’s issue is going to be a different scheduler that is written in (safe) Rust. Since it appears that there’s going to be multiple Rust graphics drivers soon, they might be able to use Lina’s scheduler and there will be more data points to compare “reuse C code as much as possible” vs “selectively duplicate infrastructure”; see also https://fosstodon.org/@airlied/113052975389174835. Remember that abstracting C to Rust is neither little code nor easy code, therefore it’s not unexpected that in some cases duplication will be easier.
The maintainer rejected the fix because (according to him) the API was not poorly designed, but simply not supposed to be used like that (literal quote: “this functionality here isn’t made for your use case”). Which makes sense and is consistent with what I wrote above: the maintainer is conscious of the limits of C and does not want the API to be used in ways that were not anticipated, whereas the Rust developer is more confident because of the more powerful compile-time checks.
This is not a good summary of the situation, though to be fair, some details are buried on Reddit. First of all, what she was doing was the correct approach to dealing with that hardware, according to multiple other DRM maintainers. Lina’s patch actually would have made existing drivers written in C less buggy, because that maintainer was in fact not conscious of the limits of C. drm_sched has very annoying and complex lifetime requirements that are easy to mess up in C, and her patch would’ve simplified them.
Relevant excerpts from that what Lina said on Reddit:
The only thing I proposed was making it valid to destroy a scheduler with jobs having not completed. I just added cleanup code to handle an additional case. It was impossible for that change to affect any existing driver that followed the existing implied undocumented rule that you have to wait for all jobs to complete before destroying the scheduler.
The scheduler is in charge of job lifetimes by design. So this change makes perfect sense. Enforcing that all jobs complete before scheduler destruction would require tracking job lifetimes in duplicate outside the scheduler, it makes no sense. And you can’t fix it by having a simple job->scheduler ref either, because then the scheduler deadlocks on the last job when it tries to free itself from within.
The only reason this doesn’t crash all the time for other GPU drivers is because they use a global scheduler, while mine uses a per-queue scheduler (because Apple’s GPU uses firmware scheduling, and this is the correct approach for that, as discussed with multiple DRM folks). A global scheduler only gets torn down when you unplug the GPU (ask eGPU users how often their systems crash when they do that… it’s a mess). A per-queue scheduler gets torn down any time a process using the GPU shuts down, so all the time. So I can’t afford that codepath to be broken.
Here it should be noted that it was not really a case of using something in a new way. The only difference is how many users are affected. Most people don’t unplug their GPU, so the fact that GPU unplugging is broken with many drivers is easy to sweep under the rug. But since it affects every user trying to use Lina’s driver, the problem can’t be swept under the rug and just be ignored.
And again, my scheduler change absolutely did not change the behavior for existing drivers at all (unless they were already broken, and then it could strictly improve things). That was provable.
I consulted with multiple DRM folks about how to design this, including actual video meetings, and was told this was the correct approach.
The generally accepted definition of “hit piece” includes an attempt to sway public opinion by publishing false information. Leaving aside the fact that the user who linked this story did not publish it, and deferring the discussion of who may or may not pay them to post, that is a significant claim that requires significant evidence.
So, please share your evidence… what’s the false information here, and how exactly is @freddyb attempting to sway public opinion? To what end? Be very specific, please.
That’s a fair point. I should have said “false or misleading.”
So I’ll amend my question, which I doubt will get answered at any rate:
@ecksdee: So, please share your evidence… what’s the false or misleading information here, and how exactly is @freddyb attempting to sway public opinion? To what end? Be very specific, please.
If you look at the history of soatoks blog on lobsters it is pretty obvious that sooner or later anyone from this community would post this entry.
Now you have to show me how mozilla is related to signal in any positive or negative way. You yourself seem to have a strong feeling towards mozilla at least.
Random sidenote: I wish there was standard shortcuts or aliases for frequently typed commands. It’s annoying to type systemctl daemon-reload after editing a unit, e.g. why not systemctl dr? Or debugging a failed unit, journalctl -xue myunit seems unnecessarily arcane, why not --debug or friendlier?
Well I guess fu rolls better of the tongue than uf. But I remember literally looking up if there isn’t anything like -f and having issues with that. Oh well.
I’m not sure it would be “clever”. At best it would make transactional changes (i.e. changes that span several files) hard, at worst impossible. It would also be a weird editing experience when just saving activates the changes.
I wonder why changes should need to be transactional? In Kubernetes we edit resource specs—which are very similar to systemd units—individually. Eventually consistency obviates transactions. I think the same could have held for systemd, right?
I wonder why changes should need to be transactional
Because the services sd manages are mote stateful. If sd restarted every service each moment their on-disk base unit file changes [1], desktop users, database admins, etc would have terrible experience.
By default, SQLite starts transactions in DEFERRED mode: they are considered read only. They are upgraded to a write transaction that requires a database lock in-flight, when query containing a write/update/delete statement is issued.
The problem is that by upgrading a transaction after it has started, SQLite will immediately return a SQLITE_BUSY error without respecting the busy_timeout previously mentioned, if the database is already locked by another connection.
One of the most common issues faced when using SQLite in a Rails application are the occasional ActiveRecord::StatementInvalid (SQLite3::BusyException: database is locked) exceptions. These occur when a DEFERRED transaction attempts to acquire the SQLite database lock in the middle of a transaction once hitting a write query while another connection holds the database lock. Since this occurs in the middle of a transaction, SQLite does not attempt to retry to transaction by calling the set busy_handler/busy_timeout callback, but instead immediately errors with a busy exception.
TLDR: if you’re seeing SQLITE_BUSY errors, try using BEGIN IMMEDIATE for transactions that you know are going to be a write.
This. Another thing is that SQLite locks are per database. So even if your two transactions operate on two totally different tables, it will still contend on the write lock.
Which is the reason to not use sqlite. You don’t want to have your logins fail just because a background worker performs a DB transaction on another table that takes a little longer. Potentially something you put to the background on purpose which suddenly interacts with other parts of the application. I’ve changed transactions to individual updates because the total IO delay would be multiple seconds.
This can for sure be avoided with clever thinking and tricks - but that isn’t always the tradeoff you want to ensure.
For me personally, the reason was SQLite completely disregarding your column types and doing some heinous dynamic typing instead. That caused some real problems. But fortunately, this has been fixed (by virtue of “STRICT” tables), and so I’d upgrade SQLite to “if you don’t need any amount of meaningful r/w concurrency, by all means use it, it’s great then”.
I think not using transactions to batch writes is another large performance mistake people make with SQLite. Many times when I hear about poor insert performance with SQLite it’s because the inserts are bare statements with implicit transactions. You get much better insert throughout with a wrapping transaction.
I am working on a project that is using sqlite on the server. A lot of these things are issues because of assumed scale, and that’s reasonable, but in my case, I know my app is an internal tool will have maximum 10 or 15 users total. It’s all just running on one VM. In this case, I’m choosing SQLite for sort of the inverse of a lot of these reasons: it is meaningfully simplifying things as compared to Postgres.
Can you elaborate on the simplification? I just started a prototype on a VM using Postgres and all I had to do was apt install postgresql and set up the user/role. I’m not very familiar with SQLite, but you still have to install it and then explicitly opt into the constraint enforcement stuff (at a minimum) so it seems to be a pretty comparable amount of work? And then if/when you need a database migration, Postgres has much better support for things like ALTER TABLE than SQLite. What am I missing?
Running and backing up the data for a Postgres is considerably more difficult. With sqlite backing up is copying the file somewhere, running it: there’s nothing to run.
At the scale mentioned above you’ll also not be dealing with that many database migrations I think once you’re stable, so why bother?
Backing up a directory rather than a file really “considerably more difficult”?
so why bother?
For me, because I know Postgres better and I dislike having to remember which options I need to turn on to make SQLite enforce constraints. It’s also painful when I do run into issues with migrations, or when I need some other feature it doesn’t have, or when that app that “will only ever run on one machine” needs to eventually run on multiple machines.
Even on a single machine, Postgres installs with one command and it supports constraints by default and backups are just copying the data directory. Maybe I’m missing something but it seems easier at any scale?
There’s no additional process running for the database. It’s just a file.
The ALTER TABLE thing is annoying but I haven’t actually needed to do that yet. It’s also not any migration, just ones where you want to change a column type. Not a super common thing for me.
Helix’s relative lack of obscurity has been a pleasant surprise. VSCode was the first IDE that got me off of pure vim (for the most part), and then trying Helix I just stayed with it since it turned out all I really missed from VSCode was the language servers, and I was happy with everything out of the box and generally prefer terminal-based workflows.
Then I got vendor-locked into Helix’s keybindings way faster than I expected. Knowing vim was nice since so many things speak it as a portable editing language, and I feared I’d be stuck in my cottage with my hipster terminal editor for the foreseeable future, but seeing the effort in Zed for instance increases the range of tooling I’ll feel fully fluent in at this point.
I think I heard about Helix due to it being on some list of cool new things made in Rust, but I started using it because I ran out of the needed stamina to fix my nvim config, and I don’t like VS Code. A modal terminal editor with good LSP support just seemed like the natural choice.
I got stuck in VSCode and thus there is a high bar of entry to get productive in Helix with things that would just work in VSCode - and if not usability then the ecosystem of plugins
I read your comment and had to see for myself, and… it’s OK for my monitor, eyes, and room lighting. A little somber, that’s all.
But if it weren’t, I could just use the reader view on my browser. Pretty sure every browser has something like that feature nowadays, or there are plugins.
Sadly, the default dark mode colors on iOS Safari don’t have sufficient contrast (dark blue on black for links), so default styles won’t work for a nontrivial fraction of visitors.
It looks dim to me too, but it actually passes WCAG 2.0 AA (#808080 on #000 is 5:31:1, greater than 4.5:1). Interestingly you get the same ratio on #fff with #6b6b6b but that seems way more readable to me.
i am begging people to make their text a legible size on small screens. my eyes are good enough to read 5-point font for now but they won’t be forever lmao
It sounds like many things are changing for the better. But at the same time it doesn’t look like we’re already in a working state for people that don’t want to tinker around. I’d rather not have random issues and quirks while actual recording support is so-so depending on the compositor. Especially when every problem is either “you’re using the wrong distro” or “you’re using the wrong compositor, works on mine”.
I also would rather not have random issues and quirks, and honestly, that means I don’t want X11. X is the king of random issues and quirks in my experience.
I mean, if the problem is that they’ve finally got a good solid solution but the software you’re using hasn’t gotten around to implementing it yet, or the software you’re using has implemented it upstream but your distro hasn’t pulled that version in yet, what other response do you really expect? You can use a system where it already works, or you can wait for the necessary changes to make their way to your setup, or you can pitch in to make it happen faster.
On a technical level I agree with you. But on a consumer level it sounds like you just have a long time of frustration during which you probably would rather off board from linux and move to a mac (there I said it, you either die a hero..). After which there is even more friction for migrating back to linux in x years when the whole ecosystem and LTS train arrived at the state of wayland that is actually usable - without resorting to hacks that make you sound like someone trying to regedit copilot out of win11.
One of the main obstacles in using Nix for development environments is mastering the language itself. It takes time to become proficient writing Nix.
How about using AI to generate it instead.
This just sounds like a really bad idea. If the language is unapproachable, change the language or help people learn it. Requiring an LLM for generating configuration will just make the problem worse over time.
Let me rephrase: If the path of least resistance is generating configuration with an LLM, most people will follow this path, and this path doesn’t aid in learning in any way.
Also, it will cover the language complexity problems, making it potentially worse over time.
The learning path is never followed, and the complexity isn’t tackled. Hence: The LLM becomes a de-facto requirement.
It helps with generating configuration without thinking about it or understanding it. The configuration becomes something obscure and assumed to work well that only gets updated by LLMs and no one else. There’s no incentive for the average person to understand what they’re doing.
But this is already a problem right? I can ask right now any LLM to generate a shell.nix for X project, and it will generate it. While I don’t like the idea of auto generating code via LLM, I can understand how having something to scaffold code can be nice in some cases.
Heck, even before LLMs people would have things like snippets to generate code, and we also had things like rails generate to generate boilerplace code for you. This only goes one step further.
Yes, and we don’t want to make it worse or pretend that it’s acceptable.
snippets to generate code, and […] rails generate to generate boilerplace code for you. This only goes one step further.
I have the opinion that boilerplate generators (LLM or not) are a symptom of a problem, not a proper solution. Ignoring that, at least a regular generator:
Can impose limits. Meaning: Providing something very basic that should always work and is easy to understand, requiring the user to learn a bit in order to go further.
The output of the generator is deterministic and controlled by the framework authors, so you can impose those limits and make sure that the output is reasonably safe.
LLMs are not good learning tools because it cannot say “no”. You need to be experienced enough to make reasonable questions in order to get reasonable answers. Portraying an LLM as an alternative to learning for newcomers is counter-productive.
This is an optional feature though, you can use it or not. If your argument is that “this makes people lazy”, well, they can already be lazy by opening ChatGPT or any other LLM and do the same.
Portraying an LLM as an alternative to learning for newcomers is counter-productive.
While the post seems to suggest this is for newcomers it is not necessary true. I could see myself using it considering I had in the past copied and pasted my Nix configuration from some random project to start a new one.
They added it to their CLI. They published a blog post about it. They set up a dedicated marketing website. They made sure it’s literally the first CTA you see on their home page.
They set up a dedicated marketing website. They made sure it’s literally the first CTA you see on their home page.
I just went to their homepage and I see no mention about this feature. But even if it had, as long as it is beside the manual way I wouldn’t say it is encouraging, it is an alternative.
Encouragement would be if they remove all mentions of manual methods or buried it up in the documentation. This is not what is happening here, if I go to their documentation they still have lots of guides in how everything works. Here, just go to: https://devenv.sh/basics/.
Do we just live in completely separate universes?
I ask the same for you. Maybe you’re seeing a different version of the homepage, or maybe in your universe a blog post is the same as home page.
It helps with generating configuration without thinking about it or understanding it. The configuration becomes something obscure and assumed to work well that only gets updated by LLMs and no one else.
There is a perfect phrase for this, this is basically a “cargo cult”.
Thanks to LLMs I now use a huge array of DSL and configuration based technologies that I used not to use, because I didn’t have the time and mental capacity to learn 100s of different custom syntaxes.
Just a few examples: jq, bash, AppleScript, GitHub Actions YAML, Dockerfile are all things that I used to mostly avoid (unless I really needed them) because I knew it would take me 30+ minutes to spin back up on the syntax… and now I use them all the time because I don’t have to do that any more.
I would not feel confident trusting some config that an LLM spits. I would check if it does what its supposed to do, and lose more time than gaining it.
If I cannot scale the amount of different technologies, I use less or simplify. Example: Bash is used extensively in CI. GitHub Actions just calls bash scripts.
It only takes me a few seconds to confirm that what an LLM has written for me works: I try it out, and if it does the thing then great! If it spits out an error I loop that through the LLM a couple of times, if that doesn’t get me to a working solution I ditch the LLM and figure it out by myself.
The productivity boost I get from working like this is enormous.
I’m wondering: Doesn’t that make your work kinda un-reproducible?
I spend a lot of time figuring out why something in a codebase is like it is or does what it does. (And the answers are often quite surprising.)
“Because an LLM said so, at this point in time” is almost never what I’m looking for. It’s just as bad as “The person who implemented (and never got around to documenting) this moved to France and became a Trappist monk”.
I’d have to completely reconstruct the code in both cases.
In my experience it’s way more frustrating and erratic. Good that it works for you.
Apart from that, I think there is value in facing repetitive and easy tasks. Eventually you get tired of it, build a better solution, and learn along the way.
For non repetitive and novel tasks, I just want to learn it myself. Productivity is a secondary concern.
The fact of the matter is that the argument for this feature works regardless of the underlying system.
Your unspoken premise appears to me to be that we should all become masters of all our tools. There was a time I agreed with that premise, but now I think we are so thoroughly surrounded by tools we have to be selective about which ones to master. For the most part with devenv, you set it up once and get on with your life, so there isn’t the same incentive to master the tool or the underlying technology as there is with your primary programming language. I’m using Nix flakes and direnv on several projects at my work; my coworkers who use Nix are mostly way less literate in it than I am and it isn’t a huge obstacle to their getting things done with the benefit of it. Very few people do a substantial amount of Nix programming.
Your unspoken premise appears to me to be that we should all become masters of all our tools.
No, it’s not.
You don’t need to master every tool you use, just a basic understanding, a sense of boundaries and what it can or can’t do.
It doesn’t matter if your objective is “mastering” or “basic understanding”, both things require some learning, and LLMs do not provide that. That’s the main premise in my argument.
I don’t use a tool if I don’t know anything about it.
You don’t need to master every tool you use, just a basic understanding, a sense of boundaries and what it can or can’t do.
I could not agree more with that. LLMs have accelerated me to that point for so many new technologies. They help me gain exactly that understanding, while saving me from having to memorize the syntax.
If you’re using LLMs to avoid learning what tools can and cannot do then you’re not taking full advantage of the benefits they can bring.
My experience with LLMs is having to re-check everything in case its an hallucination (often is) and ending up checking the docs anyways.
The syntax is easy to remember for
me from tool to tool. Most projects tend to have examples on their website and that helps to remember the details. I stick to that.
The language is interesting, and I have no real opinion about it the syntax. I played a little with it in user space, and I just absolutely hate the cargo concept. It is like pip and to me I hate having to pull down other code that I do not trust. At least with shared libraries, I can trust a third party to have done the build and all that. Also, if you have a large library it does bloat the applications that use it.
I don’t understand this attitude. The first thing I don’t understand is this about “having” to pull down code that he doesn’t trust. You don’t have to do anything, you choose what you want to depend on yourself. But then what I really don’t understand is why some random third party compiling the code and giving you a shared library suddenly makes it reasonably trustworthy.
I interpreted this as saying: In C the norm is to use fairly large shared library dependencies that come as a binary from someone you trust to have built and tested it correctly; most often it’s from a distribution with a whole CI process to manage this. The only source code you compile is your own. Whereas in Rust/Go/Python/etc. the norm is to download everybody else’s source code and compile it yourself on the fly, and the libraries are typically smaller and come from more places. Also, in the typical default setup (”^” versions) it’s easy to pull down some very recent source code without realizing it.
I can see how that would feel like you’ve thrown away a whole layer of stability and testing compared to the lib.so model.
When using the package manager of my OS to install dependencies for my program I have some safeguards. First of all I see this dependence is important enough for someone to add the package. Also there are some checks done before with the package is added. Changes are also checked before added. Also when security updates are available these are added and I can simple update them, so all running software (even the one not managed by the package manager) get the security fix after reload the dependency. Then there can’t be a maintainer go rough and just break my dependence. I get this also for dependencies of my dependence.
Of course this doesn’t do a full audit of all my dependencies and fix all problems. It just adds help to manage dependencies and trust in them. While with cargo/pip/… you need to do all this on your own for every dependence and every update.
I know it’s a spectrum, but really OS upgrades aren’t a good benchmark. There have been some incredible bugs over the years, often due to OS packaging having or ignoring the defaults of the upstream. For example when Debian screwed up the private key generation…
It is very, very hard to write any serious Rust program without any dependencies from an untrusted code repository such as Cargo. In C and C++, the same is trivial, because almost all dependencies you could ever want are provided by your operating system vendor. The core difference is between trusting some random developer who uploaded some code to Cargo, and trusting your operating system.
I first wanted to write a long comment and decided to just blog it.
The TLDR is somewhere along the lines that I doubt you can get away with only using OS dependencies and making that work is even more work. While not actually gaining as much security as you would like.
As a user, I don’t have much more trust in libfoo packaged by my distro (if it’s packaged) than in libfoo fetched from upstream. And as a distributor, I have much less trust in libfoo packaged by my users’ 50 different OSes than in libfoo I fetch from upstream.
It is very, very hard to write any serious Rust program without any dependencies from an untrusted code repository such as Cargo.
When you use distro provided libraries, you have to trust thousands of authors. I trust my distro maintainers to not do anything malicious, but I don’t trust them to vet 10000 C packages. Which means I have to trust the authors of the code to not be malicious.
The distro package update process is to pull down new sources, compile, run the test suite (if one is provided), only look at the code if something isn’t working, and then package it. After that, someone might notice if there is something bad going on. But it’s a roll of the die. At no point in this process is anything actually vetted. At most you get some vetting of a few important packages, if your distro is big and serious enough.
In C and C++, the same is trivial, because almost all dependencies you could ever want are provided by your operating system vendor.
Considering how often you find random bespoke implementations of stuff like hash tables in C projects, this is clearly untrue.
In C and C++, the same is trivial, because almost all dependencies you could ever want are provided by your operating system vendor.
Provided by your operating system vendor? Which operating system? If you write cross-platform code, you only get to use the dependencies that all platforms provide (which is pretty much none at all). C and C++ don’t even ship with proper UTF-8 support. It’s completely impossible to write any serious software in these languages without pulling in some external code (or reinventing the wheel). As someone who has earned a living with C/C++ development for years, I have a hard time understanding how you arrived at this conclusion.
The scope of the C/C++ standard library and the Rust standard library is very similar. Rust’s standard library has no time features or random number generators (for that, the semi-official crates chrono, time and rand can be used), but it has great UTF-8 support. Overall, you’ve got about the same power without using any dependencies.
My operating system vendor is the Fedora project on my desktop, and Canonical or the Debian project on various servers. All of those provide plenty of C and C++ libraries in their repositories.
I don’t understand why you talk about C++’s UTF-8 support or the size of C++‘s stdlib, that’s completely irrelevant to what I said.
If you use Windows (MSVC), you basically have no third party libraries available whatsoever. You need to download binaries and headers from various websites.
I don’t understand why you talk about C++’s UTF-8 support or the size of C++‘s stdlib, that’s completely irrelevant to what I said.
It’s the best example of a third party library you have to pull in for basically every project. This is not provided out of the box or by the operating system vendor.
I don’t use Windows. The person in the article doesn’t either. TFA contains a criticism of Rust from the perspective of a Linux user, not from the perspective of a Windows user.
In C and C++, [to write any serious program without any dependencies from an untrusted code repository] is trivial, because almost all dependencies you could ever want are provided by your operating system vendor.
to:
I don’t use Windows.
Yes, dependency management is trivial if you only ever use a system where all your desired dependencies are provided through official channels. However, this is not the experience most C/C++ programmers will have. It’s not even the case if you just use Linux: Recently, I used SDL3 and I had to download the tarball and compile the source myself because it was too new to hit the apt repository of my distro.
The complaints from the article are from the perspective of a Linux user. My post was also from the perspective of a Linux user. Windows is not relevant in this conversation.
Unless you specified it upfront then Windows is going to be presumed to be relevant. Pretending to be surprised that someone thought it wasn’t is silly.
The person you are talking to then goes on to describe a Linux counterexample that is quite common for C/C++ development. If your response to this is welll I don’t do any games/graphics/gui development then cool. You have now defined an extremely narrow category of C/C++ development where it’s common to only use libraries that come already installed. It is no where close to the general case though. Which is the point of the commenter you are responding to.
As for the SDL example: Sure, there are situations where you may need to add dependencies which aren’t in your distro in C or C++. However, this is fairly rare and you can perform a risk assessment every time (where you’d conclude that the risk associated with pulling down SDL3 outside of your distro’s repos is minimal).
Contrast this with Rust (or node.js for that matter), where you’re likely to have so many dependencies not provided by your distro that it’s completely unreasonable to vet every single one of them. For my own Rust projects, the size of my Cargo.lock file is something that worries me, just like the size of the package-lock.json file worries me for my node.js projects. I don’t know most of those dependencies.
The OSA continues to claim jusisdiction over the entire web.
I asked Ofcom to explain what legal basis there is for a country’s law being able to do so, perhaps an international treaty.
A PR flack responded to duck the question and say that the law has the power because the law says it has the power.
This was yet another waste of time by Ofcom.
Speaking of which, the new online tool linked in my previous comment is the best argument I can see for convincing someone that these regulations are hopelessly burdensome and vague.
I would invite anyone who is not yet convinced of that fact to try using it.
I’m still waiting to hear back from the US Embassy.
I think this has been delayed by the change in US presidential administrations appointing new personnel and setting new policy.
I’ll continue to follow up.
I’ve gotten pointed to the legislators who were involved in the OSA and I’ll be reaching out to their offices to ask their legislative aides to ask why they think the UK has the jurisdiction to regulate.
I have almost exhausted my options for improving on our least bad plan.
When these two inquiries are resolved, I don’t have ideas for other ways of mitigating the threat of the OSA.
I still think most likely positive resolution is that UK persons contact their MPs to delay the approval of Ofcom’s regulations until the OSA’s risks to small forums can be reduced, but that’s not something I can advance as a non-citizen.
If you are in the UK, please do contact your legislators.
Why would I attend a session about complying with a UK law, given that I’m not a UK citizen, the site is not hosted in the UK, and I’m not in the UK?
I traded email with Ofcom and asked them to explain their novel legal theory for why the UK can pass a law claiming jurisdiction over the world. They ignored my questions and replied that the law says they have jurisdiction over the world. There’s some more details in my latest update. They’re not operating in good faith.
I don’t need instruction on complying with another country’s censorship regime. I need Ofcom to acknowledge that they don’t have power over other countries, because I’m not interested in wasting years of my life proving in court that the UK doesn’t have the authority to decide what can be published in other countries.
If someone attends this session, please ask:
What international treaty gives this UK law jurisdiction over the world?
If there isn’t one, does Ofcom commit to helping enforce laws from other countries that claim to apply to UK services because occupants of those countries read them? We all know which countries want this power and what they’ll claim to be saving their children from.
* It’s not relevant, but as a programmer I have to gripe about variable names. This facially reasonable title is actually OSA legalese. “Small” mostly means “fewer than 7 million visitors physically in the UK”. Lobsters is probably “small” but there isn’t a defined time period and we don’t track demographic data to say authoritatively. “Low-risk” is not defined by the law; it is Ofcom’s invention and their public definition is impenetrable. Lobsters is probably “multi-risk” but hasn’t spent thousands of dollars on legal advice to try to know. But their deliberately poor communication is not the remaining problem here, their lack of authority is.
Ah, setenv and getenv. Steam also struggled with this recently. It is safe to say that nobody can use this API properly. In other words, it is broken (even if it behaves as documented, that is not good enough when ‘as documented’ means everyone has crashes because it’s impossible to use properly).
Unfortunately, this is yet another case where the issue originates from POSIX (MacOS and the BSDs could be affected as well). The standard shows its age, many constructs have issues or even break down when multi-threading is involved. Remember, before C11/C++11, these languages did not even describe a proper multi-threaded memory model (see also the legendary essay Threads cannot be implemented as a Library, published in 2004). Everything from start to finish was just a kludge. Of course, many amendments and mitigations were made to alleviate this problem, but POSIX is holding back innovation, both in operating systems and in the user space. Back then, most programs consisted of one or two simple C files implementing a single-threaded program. POSIX would be radically different if it were designed today.
From what I remember from one RFC I saw, part of the issue was that the glibc devs were adopting a “We documented it in the manpage. You’re just holding it wrong.” stance for a while and that delayed both the Rust-side and glibc-side changes mentioned at the end of the post since the Rust devs try to avoid unilaterally hacking around other people’s designs until they’ve run out of other, more cooperative options.
Can’t the Rust side, just… avoid getenv(3) and setenv(3)? One should be able to implement thread safe versions from system calls, can’t we? No need to be unfriendly with the glibc devs if we can avoid depending on them in the first place.
getenv(3) and setenv(3) aren’t getting the environment from a kernel service, the environment is just a global variable extern char **environ that gets initialized by execve(2) when starting a process. There’s no “other place” to get the environment from where you will be safe from libc’s predations. Part of the sturm und drang in the historical issues there seems to have been that Rust’s set_env was “safe” and had a locking regime that made it thread-safe if you only modified environ from Rust code; linked-in dependencies not written in Rust would have no idea about what Rust was doing.
Crap. I see. One thing bothers me though: why does setenv(3) even exists? The program’s environment is an input, and if it’s global it should be a global constant. We know global variables are bad since, like, 40 years now? I sense legacy from old school procedural programming, the same kind that decided that a program only needed one parser (old lex & yacc worked over global variables too).
Oh well, I guess I have yet another reason to reduce my dependencies to the absolute minimum.
POSIX/Unix and its descendants are a product of legacy. setenv(3) comes from V7 Unix 4.3BSD, which was neither SMP aware, nor did it have the concept of multiple threads of execution sharing an address space. As mentioned above the environment is just memory that’s written into the process’s address space during process creation.
Since processes didn’t even share memory at that point there was no harm in allowing writing to what was already mutable memory - and assumedly it made it easier for programs configured through env vars to manipulate those env vars.
Edit: getenv(3) comes from V7 Unix. setenv(3) comes from 4.3BSD.
Also worth noting that early Unix machines had memory best measured in kilobytes. Even if they had multiple threads of execution, there would have been design pressure to avoid taking multiple copies of the environment.
Early UNIX also had a lot of features in libc (and the kernel) that were really just for shells. The setenv function is in this category. In a shell, you have an exported environment and you want commands to modify it and for child processes to inherit it. Having a single data structure for this is useful, when you invoke execve in a child process, you just pass environ to it.
For any process that is not a shell, setenv should never be called. The two cases are entirely separable:
If you want to pass state around within a process (including to and from shared libraries), you use functions or globals.
If you want to pass state to a child process, create an environment array before exec.
setenv is useful between fork and exec to configure the environment of the child process. Yes, you could use the environment setting variants of exec but often times it is easier to just set it remove a few variables.
Setenv and getenv are not signal async safe, so you cannot use them safely in a fork of a process. If a signal had at the same time modified the environ variable, you might be reading half-valid state.
The proper way, that is valid considering all restrictions in glibc docs, to do it is to copy the environ yourself and modify the new copy in thread local state (the copy is still not async-signal safe). Then you modify it as needed, call fork and then immediately execvpe/execle.
There isn’t a good reason to do processing after a fork, it only leads to hard to diagnose bugs when inevitably you end up messing with non-async-signal-safe state.
Looks like we’re missing a function that wraps execle(3) or execvpe(3), where instead of specifying the entire environment, we only specify the value of the environment variables that changed. That way we wouldn’t have to use setenv(3) the way you suggest.
The whole UNIX API process spawning API is based around fork, call a bunch of things to change the spawned process then exec. This is pretty clever because it means that you don’t need a super complex spawning API that allows configuring everything and you can do it using real code rather than some plan-old-data which will inevitably grow some sort of rudimentary logic for common issues. So in this environment it makes sense that we don’t even really need a way to set the environment at all with exec, the “UNIX way” is fork, setenv then exec, no collection of exec variants for managing the environment in different ways.
However while the fork, …, exec solution is clever; a better solution would probably be to make all of these functions take a target process argument rather than simplicity targeting the current process so that the spawn process would look something like spawn, do whatever to configure the child with the returned process ID, start. In a scenario like this it makes more sense to pass the environment in to the spawn call and have it immutable. But this also brings up other questions, like are some functions only allowed between spawn and the first start? For opening and closing file descriptors do we need a cross-process close?
From what I’ve seen from looking at their sources, higher-level languages tend to use posix_spawn(3)/posix_spawnp(3) for their subprocess-launching APIs and switch to fork/exec if you ask for something not expressible in it.
From glancing at the source: No. Because the thing you actually want to interact with is the pointer containing the ENV which is managed by the unsafe glibc functions and used by the c-programs you’re (eventually always) interacting with. Even if you would get a hold to that, all other c-libs around your program can (and will) still change it, or read it without caring about your additional lock. Famous reason in rust to call into ENV is for example time management.
You could also have c-libs initiating before your rust code has a chance of doing anything, and now they hold getenv-pointers while you try to do anything.
Your code might also not even know that it is calling set/getenv indirectly.
You could argue that it was initiated in ‘09. Or completed in ‘10. ‘97 is so far from a typo relative to either of those that it could practically only be a hallucination. But I think I’ll just flag it “spam”.
Eh. Yours would be more accurate. But “spam” works for me. Their claim:
Oracle obtained the JavaScript brand after acquiring Sun Microsystems in 1997.
Is so completely outlandish that “spam” makes sense, IMO. There’s no universe I can imagine in which that’s a good-faith statement, anyway. And “spam” seems to cover that particular sort of bad faith pretty well.
Hey OP, I like this series and thus would like you to continue being on lobsters. Because of that please review the rules on self-promotion for lobsters.
Hey, Thanks for taking the time to point this out, I’ll take note of it. And of course, your kind comment is much appreciated!
Xe Iaso said this on Bluesky, and it’s been stuck in my head ever since: a hard fork is going to happen.
That seems extremely unlikely to me as it would be extremely divisive and expensive for LF, and LF decides what Linux is. Fuschia and other OS’s are much more likely to take market share (Android, ChromeOS, smart devices, etc) than for Linux itself to fundamentally change. Various LF funders are more likely to just sit and wait for those, or to mitigate many of these issues with other projects - ex: it’s way cheaper to build a secure VM than to build a new kernel, so the major LF companies all have their own VMs, mitigating a lot of kernel issues.
It’s not. This is just a huge tempest in a teapot.
I don’t see it happen either right now. But something will survive the current Linux in 50 years. And that is either a Linux which cleaned up all the issues - with or without Rust - or something that surpassed linux because no one bothered to clean up the legacy and people potentially even moved on from projects with millions of lines of C (which again doesn’t have to be Rust). Which in turn seems to be a question of what big corpo invests their money in (Android, VMs + Bare Metal, IoT, ChromeOS) - and what the legislation requires them to achieve in terms of security.
This is an extremely strong statement.
I think a few things are also interesting:
I think people are realizing how low quality the Linux kernel code is, how haphazard development is, how much burnout and misery is involved, etc.
I think people are realizing how insanely not in the open kernel dev is, how much is private conversations that a few are privy to, how much is politics, etc.
The Hellwig/Ojeda part of the thread is just frustrating to read because it almost feels like pleading. “We went over this in private” “we discussed this already, why are you bringing it up again?” “Linus said (in private so there’s no record)”, etc., etc.
Dragging discussions out in front of an audience is a pretty decent tactic for dealing with obstinate maintainers. They don’t like to explain their shoddy reasoning in front of people, and would prefer it remain hidden. It isn’t the first tool in the toolbelt but at a certain point there is no convincing people directly.
With quite a few things actually. A friend of mine is contributing to a non-profit, which until recently had this very toxic member (they’ve even attempted felony). They were driven out of the non-profit very soon after members talked in a thread that was accessible to all members. Obscurity is often one key component of abuse, be it mere stubbornness or criminal behaviour. Shine light, and it often goes away.
IIRC Hintjens noted this quite explicitly as a tactic of bad actors in his works.
It’s amazing how quickly people are to recognize folks trying to subvert an org piecemeal via one-off private conversations once everybody can compare notes. It’s equally amazing to see how much the same people beforehand will swear up and down oh no that’s a conspiracy theory such things can’t happen here until they’ve been burned at least once.
This is an active, unpatched attack vector in most communities.
I’ve found the lowest example of this is even meetings minutes at work. I’ve observed that people tend to act more collaboratively and seek the common good if there are public minutes, as opposed to trying to “privately” win people over to their desires.
There is something to be said for keeping things between people with skin in the game.
It’s flipped over here, though, because more people want to contribute. The question is whether it’ll be stabe long-term.
Something I’ve noticed is true in virtually everything I’ve looked deeply at is the majority of work is poor to mediocre and most people are not especially great at their jobs. So it wouldn’t surprise me if Linux is the same. (…and also wouldn’t surprise me if the wonderful Rust rewrite also ends up poor to mediocre.)
yet at the same time, another thing that astonishes me is how much stuff actually does get done and how well things manage to work anyway. And Linux also does a lot and works pretty well. Mediocre over the years can end up pretty good.
After tangentially following the kernel news, I think a lot of churning and death spiraling is happening. I would much rather have a rust-first kernel that isn’t crippled by the old guard of C developers reluctant to adopt new tech.
Take all of this energy into RedoxOS and let Linux stay in antiquity.
I’ve seen some of the R4L people talk on Mastodon, and they all seem to hate this argument.
They want to contribute to Linux because they use it, want to use it, and want to improve the lives of everyone who uses it. The fact that it’s out there and deployed and not a toy is a huge part of the reason why they want to improve it.
Hopping off into their own little projects which may or may not be useful to someone in 5-10 years’ time is not interesting to them. If it was, they’d already be working on Redox.
The most effective thing that could happen is for the Linux foundation, and Linus himself, to formally endorse and run a Rust-based kernel. They can adopt an existing one or make a concerted effort to replace large chunks of Linux’s C with Rust.
IMO the Linux project needs to figure out something pretty quickly because it seems to be bleeding maintainers and Linus isn’t getting any younger.
They may be misunderstanding the idea that others are not necessarily incentivized to do things just because it’s interesting for them (the Mastodon posters).
Yep, I made a similar remark upthread. A Rust-first kernel would have a lot of benefits over Linux, assuming a competent group of maintainers.
along similar lines: https://drewdevault.com/2024/08/30/2024-08-30-Rust-in-Linux-revisited.html
Redox does have the chains of trying to do new OS things. An ABI-compatible Rust rewrite of the Linux kernel might get further along than expected (even if it only runs in virtual contexts, without hardware support (that would come later.))
Linux developers want to work on Linux, they don’t want to make a new OS. Linux is incredibly important, and companies already have Rust-only drivers for their hardware.
Basically, sure, a new OS project would be neat, but it’s really just completely off topic in the sense that it’s not a solution for Rust for Linux. Because the “Linux” part in that matters.
I read a 25+ year old article [1] from a former Netscape developer that I think applies in part
The idea that new code is better than old is patently absurd. Old code has been used. It has been tested. Lots of bugs have been found, and they’ve been fixed. There’s nothing wrong with it. It doesn’t acquire bugs just by sitting around on your hard drive. Au contraire, baby! Is software supposed to be like an old Dodge Dart, that rusts just sitting in the garage? Is software like a teddy bear that’s kind of gross if it’s not made out of all new material?Adopting a “rust-first” kernel is throwing the baby out with the bathwater. Linux has been beaten into submission for over 30 years for a reason. It’s the largest collaborative project in human history and over 30 million lines of code. Throwing it out and starting new would be an absolutely herculean effort that would likely take years, if it ever got off the ground.
[1] https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/
The idea that old code is better than new code is patently absurd. Old code has stagnated. It was built using substandard, out of date methodologies. No one remembers what’s a bug and what’s a feature, and everyone is too scared to fix anything because of it. It doesn’t acquire new bugs because no one is willing to work on that weird ass bespoke shit you did with your C preprocessor. Au contraire, baby! Is software supposed to never learn? Are we never to adopt new tools? Can we never look at something we’ve built in an old way and wonder if new methodologies would produce something better?
This is what it looks like to say nothing, to beg the question. Numerous empirical claims, where is the justification?
It’s also self defeating on its face. I take an old codebase, I fix a bug, the codebase is now new. Which one is better?
Like most things in life the truth is somewhere in the middle. There is a reason there is the concept of a “mature node” in the semiconductor industry. They accept that new is needed for each node, but also that the new thing takes time to iron out the kinks and bugs. This is the primary reason why you see apple take new nodes on first before Nvidia for example, as Nvidia require much larger die sizes, and so less defects per square mm.
You can see this sometimes in software for example X11 vs Wayland, where adoption is slow, but most definetly progressing and now-days most people can see that Wayland is now, or is going to become the dominant tech in the space.
The truth lies where it lies. Maybe the middle, maybe elsewhere. I just don’t think we’ll get to the truth with rhetoric.
Aren’t the arguments above more dialectic than rhetoric?
I don’t think this would qualify as dialectic, it lacks any internal debate and it leans heavily on appeals by analogy and intuition/ emotion. The post itself makes a ton of empirical claims without justification even beyond the quoted bit.
fair enough, I can see how one would make that argument.
“Good” is subjective, but there is real evidence that older code does contain fewer vulnerabilities: https://www.usenix.org/conference/usenixsecurity22/presentation/alexopoulos
That means we can probably keep a lot of the old trusty Linux code around while making more of the new code safe by writing it in Rust in the first place.
I don’t think that’s a fair assessment of Spolsky’s argument or of CursedSilicon’s application of it to the Linux kernel.
Firstly, someone has already pointed out the research that suggests that existing code has fewer bugs in than new code (and that the older code is, the less likely it is to be buggy).
Secondly, this discussion is mainly around entire codebases, not just existing code. Codebases usually have an entire infrastructure around them for verifying that the behaviour of the codebase has not changed. This is often made up of tests, but it’s also made up of the users who try out a release of a codebase and determine whether it’s working for them. The difference between making a change to an existing codebase and releasing a new project largely comes down to whether this verification (both in terms of automated tests and in terms of users’ ability to use the new release) works for the new code.
Given this difference, if I want to (say) write a new OS completely in Rust, I need to choose: Do I want to make it completely compatible with Linux, and therefore take on the significant challenge of making sure everything behaves truly the same? Or do I make significant breaking changes, write my own OS, and therefore force potential adopters to rebuild their entire Linux workflows in my new OS?
The point is not that either of these options are bad, it is that they represent significant risks to a project. Added to the general risk that is writing new code, this produces a total level of risk that might be considered the baseline risk of doing a rewrite. Now risk is not bad per se! If the benefits of being able to write an OS in a language like Rust outweigh the potential risks, then it still makes sense to perform the rewrite. Or maybe the existing Linux kernel is so difficult to maintain that a new codebase really would be the better option. But the point that CursedSilicon was making by linking the Spolsky piece was, I believe, that the risks for a project like the Linux kernel are very high. There is a lot of existing, old code. And there is a very large ecosystem where either breaking or maintaining compatibility would each come with significant challenges.
Unfortunately, it’s very difficult to measure the risks and benefits here in a quantitative, comparable way, so I think where you fall on the “rewrite vs continuity” spectrum will depend mostly on what sort of examples you’ve seen, and how close you think this case is to those examples. I don’t think there’s any objective way to say whether it makes more sense to have something like R4L, or something like RedoxOS.
I haven’t read it yet, but I haven’t made an argument about that, I just created a parody of the argument as presented. I’ll be candid, i doubt that the research is going to compel me to believe that newer code is inherently buggier, it may compel me to confirm my existing belief that testing software in the field is one good method to find some classes of bugs.
I guess so, it’s a bit dependent on where we say the discussion starts - three things are relevant; RFL, which is not a wholesale rewrite, a wholesale rewrite of the Linux kernel, and Netscape. RFL is not about replacing the entire Linux kernel, although perhaps “codebase” here refers to some sort of unit, like a driver. Netscape wanted a wholesale rewrite, based on the linked post, so perhaps that’s what’s really “the single worst strategic mistake that any software company can make”, but I wonder what the boundary here is? Also, the article immediately mentions that Microsoft tried to do this with Word but it failed, but that Word didn’t suffer from this because it was still actively developed - I wonder if it really “failed” just because pyramid didn’t become the new Word? Did Microsoft have some lessons learned, or incorporate some of that code? Dunno.
I think I’m really entirely justified when I say that the post is entirely emotional/ intuitive appeals, rhetoric, and that it makes empirical claims without justification.
This is rhetoric. These are unsubstantiated empirical claims. The article is all of this. It’s fine as an interesting, thought provoking read that gets to the root of our intuitions, but I think anyone can dismiss it pretty easily since it doesn’t really provide much in the form of an argument.
Again, totally unsubstantiated. I have MANY reasons to believe that, it is simply question begging to say otherwise.
That’s all this post is. Over and over again making empirical claims with no evidence and question beggign.
We can discuss the risks and benefits, I’d advocate for that. This article posted doesn’t advocate for that. It’s rhetoric.
This is a truism. It is survival bias. If the code was buggy, it would eventually be found and fixed. So all things being equal newer code is riskier than old code. But it’s also been impirically shown that using Rust for new code is not “all things being equal”. Google showed that new code in Rust is as reliable as old code in C. Which is good news: you can use old C code from new Rust projects without the risk that comes from new C code.
Yeah, this is what I’ve been saying (not sure if you’d meant to respond to me or the parent, since we agree) - the issue isn’t “new” vs “old” it’s things like “reviewed vs unreviewed” or “released vs unreleased” or “tested well vs not tested well” or “class of bugs is trivial to express vs class of bugs is difficult to express” etc.
Was restating your thesis in the hopes of making it clearer.
I don’t disagree that the rewards can outweigh the risks, and in this case I think there’s a lot of evidence that suggests that memory safety as a default is really important for all sorts of reasons. Let alone the many other PL developments that make Rust a much more suitable language to develop in than C.
That doesn’t mean the risks don’t exist, though.
Nobody would call an old codebase with a handful of fixes a new codebase, at least not in the contexts in which those terms have been used here.
How many lines then?
It’s a Ship of Theseus—at no point can you call it a “new” codebase, but after a period of time, it could be completely different code. I have a C program I’ve been using and modifying for 25 years. At any given point, it would have been hard to say “this is now a new codebase, yet not one line of code in the project is the same as when I started (even though it does the same thing at it always has).
I don’t see the point in your question. It’s going to depend on the codebase, and on the nature of the changes; it’s going to be nuanced, and subjective at least to some degree. But the fact that it’s prone to subjectivity doesn’t mean that you get to call an old codebase with a single fixed bug a new codebase, without some heavy qualification which was lacking.
If it requires all of that nuance and context maybe the issue isn’t what’s “old” and what’s “new”.
I don’t follow, to me that seems like a non-sequitur.
What’s old and new is poorly defined and yet there’s an argument being made that “old” and “new” are good indicators of something. If they’re so poorly defined that we have to bring in all sorts of additional context like the nature of the changes, not just when they happened or the number of lines changed, etc, then it seems to me that we would be just as well served to throw away the “old” and “new” and focus on that context.
I feel like enough people would agree more-or-less on what was an “old” or “new” codebase (i.e. they would agree given particular context) that they remain useful terms in a discussion. The general context used here is apparent (at least to me) given by the discussion so far: an older codebase has been around for a while, has been maintained, has had kinks ironed out.
There’s a really important distinction here though. The point is to argue that new projects will be less stable than old ones, but you’re intuitively (and correctly) bringing in far more important context - maintenance, testing, battle testing, etc. If a new implementation has a higher degree of those properties then it being “new” stops being relevant.
Ok, but:
My point was that this statement requires a definition of “new codebase” that nobody would agree with, at least in the context of the discussion we’re in. Maybe you are attacking the base proposition without applying the surrounding context, which might be valid if this were a formal argument and not a free-for-all discussion.
I think that it would be considered no longer new if it had had significant battle-testing, for example.
FWIW the important thing in my view is that every new codebase is a potential old codebase (given time and care), and a rewrite necessarily involves a step backwards. The question should probably not be, which is immediately better?, but, which is better in the longer term (and by how much)? However your point that “new codebase” is not automatically worse is certainly valid. There are other factors than age and “time in the field” that determine quality.
Methodologies don’t matter for quality of code. They could be useful for estimates, cost control, figuring out whom you shall fire etc. But not for the quality of code.
You’re suggesting that the way you approach programming has no bearing on the quality of the produced program?
I’ve never observed a programmer become better or worse by switching methodology. Dijkstra would’ve not became better if you made him do daily standups or go through code reviews.
There are ways to improve your programming by choosing different approach but these are very individual. Methodology is mostly a beancounting tool.
When I say “methodology” I’m speaking very broadly - simply “the approach one takes”. This isn’t necessarily saying that any methodology is better than any other. The way I approach a task today is better, I think, then the way that I would have approached that task a decade ago - my methodology has changed, the way I think has changed. Perhaps that might mean I write more tests, or I test earlier, but it may mean exactly the opposite, and my methods may only work best for me.
I’m not advocating for “process” or ubiquity, only that the approach one tasks may improve over time, which I suspect we would agree on.
If you take this logic to its end, you should never create new things.
At one point in time, Linux was also the new kid on the block.
The best time to plant a tree is 30 years ago. The second best time is now.
I don’t think Joel Spolsky was ever a Netscape developer. He was a Microsoft developer who worked on Excel.
My mistake! The article contained a bit about Netscape and I misremembered it
How many of those lines are part of the core? My understanding was that the overwhelming majority was driver code. There may not be that much core subsystem code to rewrite.
For a previous project, we included a minimal Linux build. It was around 300 KLoC, which included networking and the storage stack, along with virtio drivers.
That’s around the size a single person could manage and quite easy with a motivated team.
If you started with DPDK and SPDK then you’d already have filesystems and a copy of the FreeBSD network stack to run in isolated environments.
Once many drivers share common rust wrappers over core subsystems, you could flip it and write the subsystem in Rust. Then expose C interface for the rest.
Oh sure, that would be my plan as well. And I bet some subsystem maintainers see this coming, and resist it for reasons that aren’t entirely selfless.
That’s pretty far into the future, both from a maintainer acceptance PoV and from a rustc_codegen_gcc and/or gccrs maturity PoV.
Sure. But I doubt I’ll running a different kernel 10y from now.
And like us, those maintainers are not getting any younger and if they need a hand, I am confident I’ll get faster into it with a strict type checker.
I am also confident nobody in our office would be able to help out with C at all.
This cannot possibly be true.
It’s the largest collaborative open source os kernel project in human history
It’s been described as such based purely on the number of unique human contributions to it
I would expect Wikipedia should be bigger 🤔
I see that Drew proposes a new OS in that linked article, but I think a better proposal in the same vein is a fork. You get to keep Linux, but you can start porting logic to Rust unimpeded, and it’s a manageable amount of work to keep porting upstream changes.
Remember when libav forked from ffmpeg? Michael Niedermayer single-handedly ported every single libav commit back into ffmpeg, and eventually, ffmpeg won.
At first there will be extremely high C percentage, low Rust percentage, so porting is trivial, just git merge and there will be no conflicts. As the fork ports more and more C code to Rust, however, you start to have to do porting work by inspecting the C code and determining whether the fixes apply to the corresponding Rust code. However, at that point, it means you should start seeing productivity gains, community gains, and feature gains from using a better language than C. At this point the community growth should be able to keep up with the extra porting work required. And this is when distros will start sniffing around, at first offering variants of the distro that uses the forked kernel, and if they like what they taste, they might even drop the original.
I genuinely think it’s a strong idea, given the momentum and potential amount of labor Rust community has at its disposal.
I think the competition would be great, especially in the domain of making it more contributor friendly to improve the kernel(s) that we use daily.
I certainly don’t think this is impossible, for sure. But the point ultimately still stands: Linux kernel devs don’t want a fork. They want Linux. These folks aren’t interested in competing, they’re interested in making the project they work on better. We’ll see if some others choose the fork route, but it’s still ultimately not the point of this project.
While I don’t personally want to make a new OS, I’m not sure I actually want to work on Linux. Most of the time I strive for portability, and so abstract myself from the OS whenever I can get away with it. And when I can’t, I have to say Linux’s API isn’t always that great, compared to what the BSDs have to offer (epoll vs kqueue comes to mind). Most annoying though is the lack of documentation for the less used APIs: I’ve recently worked with Netlink sockets, and for the proc stuff so far the best documentation I found was the freaking source code of a third party monitoring program.
I was shocked. Complete documentation of the public API is the minimum bar for a project as serious of the Linux kernel. I can live with an API I don’t like, but lack of documentation is a deal breaker.
I think they mean that Linux kernel devs want to work on the Linux kernel. Most (all?) R4L devs are long time Linux kernel devs. Though, maybe some of the people resigning over LKML toxicity will go work on Redox or something…
That’s is what I was saying, yes.
I’m talking about the people who develop the Linux kernel, not people who write userland programs for Linux.
Re-Implementing the kernel ABI would be a ton of work for little gain if all they wanted was to upstream all the work on new hardware drivers that is already done - and then eventually start re-implementing bits that need to be revised anyway.
If the singular required Rust toolchain didn’t feel like such a ridiculous to bootstrap 500 ton LLVM clown car I would agree with this statement without reservation.
Would zig be a better starting place?
Zig is easier to implement (and I personally like it as a language) but doesn’t have the same safety guarantees and strong type system that Rust does. It’s a give and take. I actually really like Rust and would like to see a proliferation of toolchain options, such as what’s in progress in GCC land. Overall, it would just be really nice to have an easily bootstrapped toolchain that a normal person can compile from scratch locally, although I don’t think it necessarily needs to be the default, or that using LLVM generally is an issue. However, it might be possible that no matter how you architect it, Rust might just be complicated enough that any sufficiently useful toolchain for the language could just end up being a 500 ton clown car of some kind anyways.
Depends on which parts of GP’s statement you care about: LLVM or bootstrap. Zig is still depending on LLVM (for now), but it is no longer bootstrappable in a limited number of steps (because they switched from a bootstrap C++ implementation of the compiler to keeping a compressed WASM build of the compiler as a blob.
Yep, although I would also add it’s unfair to judge Zig in any case on this matter now given it’s such a young project that clearly is going to evolve a lot before the dust begins to settle (Rust is also young, but not nearly as young as Zig). In ten to twenty years, so long as we’re all still typing away on our keyboards, we might have a dozen Zig 1.0 and a half dozen Zig 2.0 implementations!
Yeah, the absurdly low code quality and toxic environment make me think that Linux is ripe for disruption. Not like anyone can produce a production kernel overnight, but maybe a few years of sustained work might see a functional, production-ready Rust kernel for some niche applications and from there it could be expanded gradually. While it would have a lot of catching up to do with respect to Linux, I would expect it to mature much faster because of Rust, because of a lack of cruft/backwards-compatibility promises, and most importantly because it could avoid the pointless drama and toxicity that burn people out and prevent people from contributing in the first place.
What is the, some kind of a new meme? Where did you hear it first?
From the thread in OP, if you expand the messages, there is wide agreement among the maintainers that all sorts of really badly designed and almost impossible to use (safely) APIs ended up in the kernel over the years because the developers were inexperienced and kind of learning kernel development as they went. In retrospect they would have designed many of the APIs very differently.
Someone should compile everything to help future OS developers avoid those traps! There are a lot of exieting non-posix experiments though.
It’s based on my forays into the Linux kernel source code. I don’t doubt there’s some quality code lurking around somewhere, but the stuff I’ve come across (largely filesystem and filesystem adjacent) is baffling.
Seeing how many people are confidently incorrect about Linux maintainers only caring about their job security and keeping code bad to make it a barrier to entry, if nothing else taught me how online discussions are a huge game of Chinese whispers where most participants don’t have a clue of what they are talking about.
I doubt that maintainers are “only caring about their job security and keeping back code” but with all due respect: You’re also just taking arguments out of thin air right now. What I do believe is what we have seen: Pretty toxic responses from some people and a whole lot of issues trying to move forward.
Huh, I’m not seeing any claim to this end from the GP, or did I not look hard enough? At face value, saying that something has an “absurdly low code quality” does not imply anything about nefarious motives.
I can personally attest to having never made that specific claim.
Indeed that remark wasn’t directly referring to GP’s comment, but rather to the range of confidently incorrect comments that I read in the previous episodes, and to the “gatekeeping greybeards” theme that can be seen elsewhere on this page. First occurrence, found just by searching for “old”: Linux is apparently “crippled by the old guard of C developers reluctant to adopt new tech”, to which GP replied in agreement in fact. Another one, maintainers don’t want to “do the hard work”.
Still, in GP’s case the Chinese whispers have reduced “the safety of this API is hard to formalize and you pretty much have to use it the way everybody does it” to “absurdly low quality”. To which I ask, what is more likely. 1) That 30-million lines of code contain various levels of technical debt of which maintainers are aware; and that said maintainers are worried even of code where the technical debt is real but not causing substantial issue in practice? Or 2) that a piece of software gets to run on literally billions of devices of all sizes and prices just because it’s free and in spite of its “absurdly low quality”?
Linux is not perfect, neither technically nor socially. But it sure takes a lot of entitlement and self-righteousness to declare it “of absurdly low quality” with a straight face.
GP here: I probably should have said “shockingly” rather than “absurdly”. I didn’t really expect to get lawyered over that one word, but yeah, the idea was that for a software that runs on billions of devices, the code quality is shockingly low.
Of course, this is plainly subjective. If your code quality standards are a lot lower than mine then you might disagree with my assessment.
That said, I suspect adoption is a poor proxy for code quality. Internet Explorer was widely adopted and yet it’s broadly understood to have been poorly written.
I’m sure self-righteousness could get you to the same place, but in my case I arrived by way of experience. You can relax, I wasn’t attacking Linux—I like Linux—it just has a lot of opportunity for improvement.
I guess I’ve seen the internals of too much proprietary software now to be shocked by anything about Linux per se. I might even argue that the quality of Linux is surprisingly good, considering its origins and development model.
I think I’d lawyer you a tiny bit differently: some of the bugs in the kernel shock me when I consider how many devices run that code and fulfill their purposes despite those bugs.
FWIW, I was not making a dig at open source software, and yes plenty of corporate software is worse. I guess my expectations for Linux are higher because of how often it is touted as exemplary in some form or another. I don’t even dislike Linux, I think it’s the best thing out there for a huge swath of use cases—I just see some pretty big opportunities for improvement.
Or actual benchmarks: the performance the Linux kernel leaves on the table in some cases is absurd. And sure it’s just one example, but I wouldn’t be surprised if it was representative of a good portion of the kernel.
Well not quite but still “considered broken beyond repair by many people related to life time management” - which is definitely worse than “hard to formalize” when “the way ever[y]body does it” seems to vary between each user.
I love Rust but still, we’re talking of a language which (for good reasons!) considers doubly linked lists unsafe. Take an API that gets a 4 on Rusty Russell’s API design scale (“Follow common convention and you’ll get it right”), but which was designed for a completely different programming language if not paradigm, and it’s not surprising that it can’t easily be transformed into a 9 (“The compiler/linker won’t let you get it wrong”). But at the same time there are a dozen ways in which, according to the same scale, things could actually be worse!
What I dislike is that people are seeing “awareness of complexity” and the message they spread is “absurdly low quality”.
Note that doubly linked lists are not a special case at all in Rust. All the other common data structures like
Vec,HashMapetc. also need unsafe code in their implementation.Implementing these datastructures in Rust, and writing unsafe code in general, is indeed roughly a 4. But these are all already implemented in the standard library, with an API that actually is at a 9. And
std::collections::LinkedListis constructive proof that you can have a safe Rust abstraction for doubly linked lists.Yes, the implementation could have bugs, thus making the abstraction leaky. But that’s the case for literally everything, down to the hardware that your code runs on.
You’re absolutely right that you can build abstractions with enough effort.
My point is that if a doubly linked list is (again, for good reasons) hard to make into a 9, a 20-year-old API may very well be even harder. In fact,
std::collections::LinkedListis safe but still not great (for example the cursor API is still unstable); and being in std, it was designed/reviewed by some of the most knowledgeable Rust developers, sort of by definition. That’s the conundrum that maintainers face and, if they realize that, it’s a good thing. I would be scared if maintainers handwaved that away.Bugs happen, but if the abstraction is downright wrong then that’s something I wouldn’t underestimate. A lot of the appeal of Rust in Linux lies exactly in documenting/formalizing these unwritten rules, and wrong documentation can be worse than no documentation (cue the negative parts of the API design scale!); even more so if your documentation is a formal model like a set of Rust types and functions.
That said, the same thing can happen in a Rust-first kernel, which will also have a lot of unsafe code. And it would be much harder to fix it in a Rust-first kernel, than in Linux at a time when it’s just feeling the waters.
At the same time, it was included almost as like, half a joke, and nobody uses it, so there’s not a lot of pressure to actually finish off the cursor API.
It’s also not the kind of linked list the kernel would use, as they’d want an intrusive one.
And yet, safe to use doubly linked lists written in Rust exist. That the implementation needs unsafe is not a real problem. That’s how we should look at wrapping C code in safe Rust abstractions.
The whole comment you replied to, after the one sentence about linked lists, is about abstractions. And abstractions are rarely going to be easy, and sometimes could be hardly possible.
That’s just a fact. Confusing this fact for something as hyperbolic as “absurdly low quality” is stunning example of the Dunning Kruger effect, and frankly insulting as well.
I personally would call Linux low quality because many parts of it are buggy as sin. My GPU stops working properly literally every other time I upgrade Linux.
No one is saying that Linux is low quality because it’s hard or impossible to abstract some subsystems in Rust, they’re saying it’s low quality because a lot of it barely works! I would say that your “Chinese whispers” misrepresents the situation and what people here are actually saying. “the safety of this API is hard to formalize and you pretty much have to use it the way everybody does it” doesn’t apply if no one can tell you how to use an API, and everyone does it differently.
I agree, Linux is the worst of all kernels.
Except for all the others.
Actually, the NT kernel of all things seems to have a pretty good reputation, and I wouldn’t dismiss the BSD kernels out of hand. I don’t know which kernel is better, but it seems you do. If you could explain how you came to this conclusion that would be most helpful.
NT gets a bad rap because of the OS on top of it, not because it’s actually bad. NT itself is a very well-designed kernel.
*nod* I haven’t been a Windows person since shortly after the release of Windows XP (i.e. the first online activation DRM’d Windows) but, whenever I see glimpses of what’s going on inside the NT kernel in places like Project Zero: The Definitive Guide on Win32 to NT Path Conversion, it really makes me want to know more.
More likely a fork that gets rusted from the inside out
Somewhere else it was mentioned that most developers in the kernel could just not be bothered with checking for basic things.
Nobody is forcing any of these people to do this.
I found the first reply on LKML to be very interesting.
To quote:
And
Which feels to me like there is a strong chicken and egg problem: To actually add any rust bindings for certain kernel parts you would need to first rewrite them, because there is apparently no actual defined way to call them safely.
Which means it’s not about adding rust, it’s about rust being the reason to poke where it hurts. Potentially requiring a rewrite of hundreds of thousands LOC to even start seeing any benefits. In a state where I wouldn’t blame any maintainer that told me they don’t actually know how that part of the code truly works.
Yeah. Part of the drama has been the R4L folks trying to get subsystem maintainers in these areas to document the “right ways” to use the APIs so the Rust API can incorporate those rules, and some maintainers saying “just do it like that other filesystem and stop harassing us, you said you’d do all the work”. (At least that’s how they’re perceived.) But it’s not like they would let the R4L folks go in and rewrite that stuff, either.
I recall Asahi Lina’s comments on drm_sched. Choice quotes:
If I remember correctly, most C drivers that use
drm_scheddo not get this right, but it doesn’t come up much because most people aren’t trying to shut down their GPUs other than when they’re shutting off their computers, unless they’re using an eGPU (and eGPUs are notoriously semi-broken on Linux). Lina’s M1 GPU driver uses a scheduler per GPU context (/per application), hence schedulers are torn down whenever graphical applications are closed, so her driver couldn’t just ignore the complexity like most other drivers appear to do.Those statements just come across to me as “we built something unmaintainable and now I don’t want to maintain it”, i.e., a way to avoid doing the hard work.
Because the cancer metaphor worked so well for Hellwig the last time he used it…
I wouldn’t blame anyone for that. The road to hell is paved with good intentions. And most of the people maintaining it now probably didn’t start it.
If they’re a paid maintainer, then it’s their job to do just that. Hellwig is a guy who has explicitly said he doesn’t want any other maintainers.
I think you’re underestimating how many years it would take to replace some of this code, let alone verify it actually works on the real hardware without random crashes (as we’ve seen in other reports about new CPU architectures playing Heisenbug). Sure you would want to do that eventually - but I don’t want to be the one telling everyone I’m gonna freeze features until this is done, with potentially more bugs when it’s finished.
It’s one thing to say “I don’t have the time to fix this”, it’s another to reject a proposed fix (see drm_sched comment above) or to prevent other people from working on fixes elsewhere in the tree (Hellwig). You don’t have to freeze you feature work when other people are working on fixes and refactorings.
How long are you willing to wait for an updated Linux kernel? It may not be “we are unwilling to do maintenance” and more “this is a lot of major work where intermediate steps might not be usable.”
There’s a reason it’s called technical debt. It only gets worse the longer you put it off.
So I ask again: how long are you willing to wait for an updated Linux kernel with less technical debt?
You’re treating it as a false dichotomy and trying to paint me uncharitably. Stop that.
For my personal projects, I can “pay myself” to address technical debt. And I have, because I’m the only user of my code and thus, I have final say in what and how it works. At my previous job, any attempt to address technical debt of the project (that I had been working on for over a decade, pretty much from the start of it) would have been shut down immediately as being too risky, despite the 17,000+ tests [1].
Where do the incentives come in to address technical debt in the Linux kernel? Is that a better way to ask the question?
[1] Thanks to new management. At one point, my new manager revoked the code I rewrote to address some minor technical debt to the original code plus the minimum to get it working, because the rewrite was deemed “too risky”.
Seems to confirm the point I made here:
But if I can’t even get around the morons up top, I’m out pretty quick, one way or another.
The kernel has happily managed major API rewrites before, either merging the changes bit by bit or maintaining both versions in tree until the old one is ripe for deletion. And thru the magic of git and community effort, none of that has to delay the release of new kernels.
That is false.
Which part?
https://lore.kernel.org/lkml/20250128092334.GA28548@lst.de/
Ah, thanks. The meaning I take from that statement is not the same meaning I took from your comment.
I’m trying to see what you were getting at. What did you mean by “just that”?
Doing the design work to create safe-to-use APIs with lifetimes considered is part of the work of the maintainer in my view because they should have the best perspective to do so. They got it into that state, they can get it out of that state. Whining that it’s hard work shouldn’t be acceptable as a reason to not do the work.
I’m not aware of any precedent for something like this so maybe there’s a way in which you’re right. But there seems to be a contradiction on whether you think we should defer to their judgement.
I don’t agree with that. I accept the RfL side’s refusal to build and test their own OS or Linux fork, for example.
I think their judgment is separate from their capability. I don’t think any of these maintainers are fundamentally incompetent people. I’m not sure if they need mentorship on building APIs with regards to lifetimes because they should be aware memory has lifetimes everywhere, implicitly, in their code already.
I can’t see a way to separate those two things honestly.
because if you trust them to define “lifetimes,” doesn’t that mean you trust them to estimate the amount of time before such a point when the costs of changing the API outweigh the benefits? yet you don’t trust their estimation of the costs imposed by the practice and the amount of extra work it would take before it yields benefits?
Good point! This is something I noticed in a previous job as well, where we introduced computer assistance to existing manual workflows. Apparently the real reason for resistance from the workers was that in the course of this computerization, their “traditional” workflows would be documented and maybe even evaluated before they could be encoded in a computer program. But IIRC this reason was never said out loud by anyone – some developers realized this reason on their own and adjusted their approach, but some didn’t realize this and wondered about the constant pushback.
And maybe to the managers of those workers the computerization was not even the real goal, but the real improvement was supposed to come from the “inventorization” of existing workflows. In a similar way, while the Rust devs want Rust to enter the kernel, maybe some progressive Linux devs sees Rust “just” as a vehicle to make Linux internals more strict and more understandable, and introducing Rust is maybe just a happy side effect of this.
This is the entire situation from the start. Like this is what Rust for Linux is.
Absolutely, and also people do not recognize how unforgiving Rust is to even normal APIs that to you would use in C (insert joke on linked lists), and the constraints that the word “safely” means when applied to Rust.
Not being able to devise a “safe” Rust abstraction doesn’t mean that the API must be a source of insecurity. Certainly it isn’t a great start, I will grant that, but generally in C you will find that most code ends up doing the same thing that works. The maintainers however recognize that this is not the way to introduce a semi formal definition of how the API operates, and are worried that it may not be possible at all. This is being aware of the environment and the complexity that comes from 20-30 years of development in C; it’s not wanting to “avoid doing the hard work”.
(For another example, https://lobste.rs/s/hdj2q4/greg_kroah_hartman_makes_compelling_case#c_f5pzow shows how an API could start causing problems when you use it differently, and how one might want to use it differently if he/she has more confidence thanks to a better programming language).
That comment shows that the API is poorly designed and fixing it was rejected by a maintainer, though?
The maintainer rejected the fix because (according to him) the API was not poorly designed, but simply not supposed to be used like that (literal quote: “this functionality here isn’t made for your use case”). Which makes sense and is consistent with what I wrote above: the maintainer is conscious of the limits of C and does not want the API to be used in ways that were not anticipated, whereas the Rust developer is more confident because of the more powerful compile-time checks.
Not knowing the code I cannot understand the tradeoffs involved in the fix. I can’t say whether the maintainer was too cautious, and obviously the failure mode (use after free) is anything but great. My point is that, as you look more in depth, you can see that people actually do put thought in their decisions, but clashes can and will happen if they evaluate the tradeoffs differently.
As an aside: drm_sched is utility code, not a core part of the graphics stack, so for now the solution to Lina’s issue is going to be a different scheduler that is written in (safe) Rust. Since it appears that there’s going to be multiple Rust graphics drivers soon, they might be able to use Lina’s scheduler and there will be more data points to compare “reuse C code as much as possible” vs “selectively duplicate infrastructure”; see also https://fosstodon.org/@airlied/113052975389174835. Remember that abstracting C to Rust is neither little code nor easy code, therefore it’s not unexpected that in some cases duplication will be easier.
This is not a good summary of the situation, though to be fair, some details are buried on Reddit. First of all, what she was doing was the correct approach to dealing with that hardware, according to multiple other DRM maintainers. Lina’s patch actually would have made existing drivers written in C less buggy, because that maintainer was in fact not conscious of the limits of C.
drm_schedhas very annoying and complex lifetime requirements that are easy to mess up in C, and her patch would’ve simplified them.Relevant excerpts from that what Lina said on Reddit:
Here it should be noted that it was not really a case of using something in a new way. The only difference is how many users are affected. Most people don’t unplug their GPU, so the fact that GPU unplugging is broken with many drivers is easy to sweep under the rug. But since it affects every user trying to use Lina’s driver, the problem can’t be swept under the rug and just be ignored.
The generally accepted definition of “hit piece” includes an attempt to sway public opinion by publishing false information. Leaving aside the fact that the user who linked this story did not publish it, and deferring the discussion of who may or may not pay them to post, that is a significant claim that requires significant evidence.
So, please share your evidence… what’s the false information here, and how exactly is @freddyb attempting to sway public opinion? To what end? Be very specific, please.
I don’t think “hit piece” implies false information, just a lopsided sample of the information available.
That’s a fair point. I should have said “false or misleading.”
So I’ll amend my question, which I doubt will get answered at any rate:
@ecksdee: So, please share your evidence… what’s the false or misleading information here, and how exactly is @freddyb attempting to sway public opinion? To what end? Be very specific, please.
If you look at the history of soatoks blog on lobsters it is pretty obvious that sooner or later anyone from this community would post this entry.
Now you have to show me how mozilla is related to signal in any positive or negative way. You yourself seem to have a strong feeling towards mozilla at least.
Random sidenote: I wish there was standard shortcuts or aliases for frequently typed commands. It’s annoying to type
systemctl daemon-reloadafter editing a unit, e.g. why notsystemctl dr? Or debugging a failed unit,journalctl -xue myunitseems unnecessarily arcane, why not--debugor friendlier?I’m using these:
this is shorter to type, completion still works and I get my less options
Typing this for me looks like sy<tab><tab> d<tab> - doesn’t your shell have systemd completions ?
It does but what you describe doesn’t work for me.
what doesn’t work ? in any modern shell when you are here and type tab twice you will get to daemon-reload. ex: https://streamable.com/jdedh6
your shell doesn’t show up a tab-movable highlight when such prompt appears? If so, try that out. It’s very nice feature.
journalctl -u <service> --followis equally annoyingjournalctl -fu
My favorite command in all linux. Some daemon is not working. F U Mr. Daemon!
so this does exist - I could swear I tried that before and it didn’t work
I wasn’t sure whether to read it as short args or a message directed at journalctl.
Thankfully it can be both! :)
You gotta use -fu not -uf, nothing makes you madder then having to follow some service logs :rage:
That’s standard getopt behaviour.
Well I guess fu rolls better of the tongue than uf. But I remember literally looking up if there isn’t anything like -f and having issues with that. Oh well.
Would it be “too clever” for
systemdto wait for unit files to change and reload the affected system automagically when it changed?I’m not sure it would be “clever”. At best it would make transactional changes (i.e. changes that span several files) hard, at worst impossible. It would also be a weird editing experience when just saving activates the changes.
I wonder why changes should need to be transactional? In Kubernetes we edit resource specs—which are very similar to systemd units—individually. Eventually consistency obviates transactions. I think the same could have held for systemd, right?
Because the services sd manages are mote stateful. If sd restarted every service each moment their on-disk base unit file changes [1], desktop users, database admins, etc would have terrible experience.
[1] say during a routine distro upgrade.
Shorter commands would be easier to type accidentally. I approve of something as powerful as systemctl not being that way.
Does tab completion not work for you, though?
That
BEGIN IMMEDIATEthing is SQLite’s single biggest footgun. Great to see a really clear explanation of it - previously I’ve sent people to these:Optimizing SQLite for servers
Ensure SQLite transaction default to IMMEDIATE mode PR to Rails:
TLDR: if you’re seeing
SQLITE_BUSYerrors, try usingBEGIN IMMEDIATEfor transactions that you know are going to be a write.It turns out I’ve written about this a bunch of times now, so I started a new tag on my blog to group them all together: https://simonwillison.net/tags/sqlite-busy/
This. Another thing is that SQLite locks are per database. So even if your two transactions operate on two totally different tables, it will still contend on the write lock.
Which is the reason to not use sqlite. You don’t want to have your logins fail just because a background worker performs a DB transaction on another table that takes a little longer. Potentially something you put to the background on purpose which suddenly interacts with other parts of the application. I’ve changed transactions to individual updates because the total IO delay would be multiple seconds.
This can for sure be avoided with clever thinking and tricks - but that isn’t always the tradeoff you want to ensure.
For me personally, the reason was SQLite completely disregarding your column types and doing some heinous dynamic typing instead. That caused some real problems. But fortunately, this has been fixed (by virtue of “STRICT” tables), and so I’d upgrade SQLite to “if you don’t need any amount of meaningful r/w concurrency, by all means use it, it’s great then”.
I think not using transactions to batch writes is another large performance mistake people make with SQLite. Many times when I hear about poor insert performance with SQLite it’s because the inserts are bare statements with implicit transactions. You get much better insert throughout with a wrapping transaction.
I guess this solves the problem of too much 3rd party code in your repo.
Just have to put the CPU into protectionist mode!
I am working on a project that is using sqlite on the server. A lot of these things are issues because of assumed scale, and that’s reasonable, but in my case, I know my app is an internal tool will have maximum 10 or 15 users total. It’s all just running on one VM. In this case, I’m choosing SQLite for sort of the inverse of a lot of these reasons: it is meaningfully simplifying things as compared to Postgres.
Most compelling reason for me is how easy it is to test migrations when you don’t need a full DBMS to do any dev work.
Can you elaborate on the simplification? I just started a prototype on a VM using Postgres and all I had to do was apt install postgresql and set up the user/role. I’m not very familiar with SQLite, but you still have to install it and then explicitly opt into the constraint enforcement stuff (at a minimum) so it seems to be a pretty comparable amount of work? And then if/when you need a database migration, Postgres has much better support for things like ALTER TABLE than SQLite. What am I missing?
Running and backing up the data for a Postgres is considerably more difficult. With sqlite backing up is copying the file somewhere, running it: there’s nothing to run.
At the scale mentioned above you’ll also not be dealing with that many database migrations I think once you’re stable, so why bother?
Backing up a directory rather than a file really “considerably more difficult”?
For me, because I know Postgres better and I dislike having to remember which options I need to turn on to make SQLite enforce constraints. It’s also painful when I do run into issues with migrations, or when I need some other feature it doesn’t have, or when that app that “will only ever run on one machine” needs to eventually run on multiple machines.
Even on a single machine, Postgres installs with one command and it supports constraints by default and backups are just copying the data directory. Maybe I’m missing something but it seems easier at any scale?
There’s no additional process running for the database. It’s just a file.
The ALTER TABLE thing is annoying but I haven’t actually needed to do that yet. It’s also not any migration, just ones where you want to change a column type. Not a super common thing for me.
That is very unfortunate for Asahi / Linux on M1 chips. But potentially also for Rust on Linux and Linux itself.
I’m suprised that helix is the 4th most used editor with rust (at least in the survey). But makes sense, it feels quite home.
Helix’s relative lack of obscurity has been a pleasant surprise. VSCode was the first IDE that got me off of pure vim (for the most part), and then trying Helix I just stayed with it since it turned out all I really missed from VSCode was the language servers, and I was happy with everything out of the box and generally prefer terminal-based workflows.
Then I got vendor-locked into Helix’s keybindings way faster than I expected. Knowing vim was nice since so many things speak it as a portable editing language, and I feared I’d be stuck in my cottage with my hipster terminal editor for the foreseeable future, but seeing the effort in Zed for instance increases the range of tooling I’ll feel fully fluent in at this point.
I was surprised by how widespread it is in my company. May have to do with rust users also looking for rust written tools and/or adopting newer tech ?
I think I heard about Helix due to it being on some list of cool new things made in Rust, but I started using it because I ran out of the needed stamina to fix my nvim config, and I don’t like VS Code. A modal terminal editor with good LSP support just seemed like the natural choice.
I got stuck in VSCode and thus there is a high bar of entry to get productive in Helix with things that would just work in VSCode - and if not usability then the ecosystem of plugins
already posted: https://lobste.rs/s/zck7bo/resigning_as_asahi_linux_project_lead (/cc @pushcx for merge)
Oops, sorry. Looked here an hour ago and it wasn’t posted yet…
I think it’s nice to have both sometimes. AFAIK there is no way to say “submit as merged link” or something.
My current dev time is going into improving story merging and several items would be relevant here:
Thank you!
Maybe comments should be a DAG instead of a tree :-)
I am begging people to actually pay attention to contrast ratios. Dark gray on black background is unreadable to my eyes.
I read this comment and thought “it can’t be that bad” then clicked on the link, and to my surprise, it is that bad.
I read your comment and had to see for myself, and… it’s OK for my monitor, eyes, and room lighting. A little somber, that’s all.
But if it weren’t, I could just use the reader view on my browser. Pretty sure every browser has something like that feature nowadays, or there are plugins.
I would like people who do a lot of website-ing to just avoid styling their content more. Let the user’s browser figure out how it wants to show it.
Sadly, the default dark mode colors on iOS Safari don’t have sufficient contrast (dark blue on black for links), so default styles won’t work for a nontrivial fraction of visitors.
That is unfortunate, but that very much feels like a problem that should be fixed, and is easier to fix, in Safari.
My previous comments on making dork mode less rancid – it’s only a few lines of css tho I agree Safari ought to be fixed. (macOS Safari has the same bug, iirc.)
Can recommend firefox reader mode in the URL bar.
It looks dim to me too, but it actually passes WCAG 2.0 AA (#808080 on #000 is 5:31:1, greater than 4.5:1). Interestingly you get the same ratio on #fff with #6b6b6b but that seems way more readable to me.
i am begging people to make their text a legible size on small screens. my eyes are good enough to read 5-point font for now but they won’t be forever lmao
and it’s like two lines of code
Yeah. My favourite way to get text to adapt to screen size is
because then I don’t have to monkey around with media size cut points.
yeah it Just Works more or less, and it’s great. i use it on my own website
breakpoints get really sketchy for a bunch of reasons ime
You’re not alone, this is really tough for me too. And I’ve done my damnest to get my monitor to have correct-ish contrast/saturation etc. values.
It sounds like many things are changing for the better. But at the same time it doesn’t look like we’re already in a working state for people that don’t want to tinker around. I’d rather not have random issues and quirks while actual recording support is so-so depending on the compositor. Especially when every problem is either “you’re using the wrong distro” or “you’re using the wrong compositor, works on mine”.
I also would rather not have random issues and quirks, and honestly, that means I don’t want X11. X is the king of random issues and quirks in my experience.
I mean, if the problem is that they’ve finally got a good solid solution but the software you’re using hasn’t gotten around to implementing it yet, or the software you’re using has implemented it upstream but your distro hasn’t pulled that version in yet, what other response do you really expect? You can use a system where it already works, or you can wait for the necessary changes to make their way to your setup, or you can pitch in to make it happen faster.
On a technical level I agree with you. But on a consumer level it sounds like you just have a long time of frustration during which you probably would rather off board from linux and move to a mac (there I said it, you either die a hero..). After which there is even more friction for migrating back to linux in x years when the whole ecosystem and LTS train arrived at the state of wayland that is actually usable - without resorting to hacks that make you sound like someone trying to regedit copilot out of win11.
Or to rephrase: If Linux on the desktop is your goal, then this is not the state where you can tell people that wayland is good now.
I think the “Linux on desktop” is approaching the problem at wrong angle. Instead it should be “GNOME desktop” or “KDE Plasma desktop” or whatever.
Users don’t give a damn the thing runs on Linux. They give a damn on clickable things. So you market the clickable things.
This just sounds like a really bad idea. If the language is unapproachable, change the language or help people learn it. Requiring an LLM for generating configuration will just make the problem worse over time.
It’s not required, it’s there to aid generating a scaffold.
Let me rephrase: If the path of least resistance is generating configuration with an LLM, most people will follow this path, and this path doesn’t aid in learning in any way.
Also, it will cover the language complexity problems, making it potentially worse over time.
The learning path is never followed, and the complexity isn’t tackled. Hence: The LLM becomes a de-facto requirement.
I find that a strawman, if it doesn’t help then you still need a better language. If it does then problem solved :)
It helps with generating configuration without thinking about it or understanding it. The configuration becomes something obscure and assumed to work well that only gets updated by LLMs and no one else. There’s no incentive for the average person to understand what they’re doing.
If that’s okay, then sure, go ahead.
But this is already a problem right? I can ask right now any LLM to generate a
shell.nixfor X project, and it will generate it. While I don’t like the idea of auto generating code via LLM, I can understand how having something to scaffold code can be nice in some cases.Heck, even before LLMs people would have things like snippets to generate code, and we also had things like
rails generateto generate boilerplace code for you. This only goes one step further.Yes, and we don’t want to make it worse or pretend that it’s acceptable.
I have the opinion that boilerplate generators (LLM or not) are a symptom of a problem, not a proper solution. Ignoring that, at least a regular generator:
LLMs are not good learning tools because it cannot say “no”. You need to be experienced enough to make reasonable questions in order to get reasonable answers. Portraying an LLM as an alternative to learning for newcomers is counter-productive.
This is an optional feature though, you can use it or not. If your argument is that “this makes people lazy”, well, they can already be lazy by opening ChatGPT or any other LLM and do the same.
While the post seems to suggest this is for newcomers it is not necessary true. I could see myself using it considering I had in the past copied and pasted my Nix configuration from some random project to start a new one.
People is free to do so, but embracing it in the project itself is different.
If the project encourages it, then it’s not a shortcut, it’s the default way to do things.
Obviously anyone can take advantage of this, but newcomers are by far the most impacted. It doesn’t flatten the learning curve, it side-steps it.
I think you are reading too much on this, I am not seeing the project encouraging it, just being an alternative.
They added it to their CLI. They published a blog post about it. They set up a dedicated marketing website. They made sure it’s literally the first CTA you see on their home page.
Do we just live in completely separate universes?
I just went to their homepage and I see no mention about this feature. But even if it had, as long as it is beside the manual way I wouldn’t say it is encouraging, it is an alternative.
Encouragement would be if they remove all mentions of manual methods or buried it up in the documentation. This is not what is happening here, if I go to their documentation they still have lots of guides in how everything works. Here, just go to: https://devenv.sh/basics/.
I ask the same for you. Maybe you’re seeing a different version of the homepage, or maybe in your universe a blog post is the same as home page.
It’s indeed in a very prominent position on the homepage
Now I see it, but it is not as prominient as you both making for. And also, the manual methods are still described in the same homepage.
Go to https://devenv.sh/.
Here are the first three paragraphs:
There is a perfect phrase for this, this is basically a “cargo cult”.
Thanks to LLMs I now use a huge array of DSL and configuration based technologies that I used not to use, because I didn’t have the time and mental capacity to learn 100s of different custom syntaxes.
Just a few examples: jq, bash, AppleScript, GitHub Actions YAML, Dockerfile are all things that I used to mostly avoid (unless I really needed them) because I knew it would take me 30+ minutes to spin back up on the syntax… and now I use them all the time because I don’t have to do that any more.
Add Nix to that list.
I would not feel confident trusting some config that an LLM spits. I would check if it does what its supposed to do, and lose more time than gaining it.
If I cannot scale the amount of different technologies, I use less or simplify. Example: Bash is used extensively in CI. GitHub Actions just calls bash scripts.
It only takes me a few seconds to confirm that what an LLM has written for me works: I try it out, and if it does the thing then great! If it spits out an error I loop that through the LLM a couple of times, if that doesn’t get me to a working solution I ditch the LLM and figure it out by myself.
The productivity boost I get from working like this is enormous.
I’m wondering: Doesn’t that make your work kinda un-reproducible?
I spend a lot of time figuring out why something in a codebase is like it is or does what it does. (And the answers are often quite surprising.)
“Because an LLM said so, at this point in time” is almost never what I’m looking for. It’s just as bad as “The person who implemented (and never got around to documenting) this moved to France and became a Trappist monk”.
I’d have to completely reconstruct the code in both cases.
You have to be really disciplined with this stuff.
Throwaway prototype? Don’t worry about it. Do the Andrej Karpathy vibe coding thing.
Code that you’re going to be maintaining for a long time? Don’t commit anything unless you not only understand it but could explain how it works to someone else.
In my experience it’s way more frustrating and erratic. Good that it works for you.
Apart from that, I think there is value in facing repetitive and easy tasks. Eventually you get tired of it, build a better solution, and learn along the way.
For non repetitive and novel tasks, I just want to learn it myself. Productivity is a secondary concern.
Thank you for building it. I know nix, yet I still just want to quickly build an env and move on to building. I think it’s. Cool exploration
The fact of the matter is that the argument for this feature works regardless of the underlying system.
Your unspoken premise appears to me to be that we should all become masters of all our tools. There was a time I agreed with that premise, but now I think we are so thoroughly surrounded by tools we have to be selective about which ones to master. For the most part with devenv, you set it up once and get on with your life, so there isn’t the same incentive to master the tool or the underlying technology as there is with your primary programming language. I’m using Nix flakes and direnv on several projects at my work; my coworkers who use Nix are mostly way less literate in it than I am and it isn’t a huge obstacle to their getting things done with the benefit of it. Very few people do a substantial amount of Nix programming.
No, it’s not.
You don’t need to master every tool you use, just a basic understanding, a sense of boundaries and what it can or can’t do.
It doesn’t matter if your objective is “mastering” or “basic understanding”, both things require some learning, and LLMs do not provide that. That’s the main premise in my argument.
I don’t use a tool if I don’t know anything about it.
I could not agree more with that. LLMs have accelerated me to that point for so many new technologies. They help me gain exactly that understanding, while saving me from having to memorize the syntax.
If you’re using LLMs to avoid learning what tools can and cannot do then you’re not taking full advantage of the benefits they can bring.
My experience with LLMs is having to re-check everything in case its an hallucination (often is) and ending up checking the docs anyways.
The syntax is easy to remember for me from tool to tool. Most projects tend to have examples on their website and that helps to remember the details. I stick to that.
From one of the slides:
I don’t understand this attitude. The first thing I don’t understand is this about “having” to pull down code that he doesn’t trust. You don’t have to do anything, you choose what you want to depend on yourself. But then what I really don’t understand is why some random third party compiling the code and giving you a shared library suddenly makes it reasonably trustworthy.
I interpreted this as saying: In C the norm is to use fairly large shared library dependencies that come as a binary from someone you trust to have built and tested it correctly; most often it’s from a distribution with a whole CI process to manage this. The only source code you compile is your own. Whereas in Rust/Go/Python/etc. the norm is to download everybody else’s source code and compile it yourself on the fly, and the libraries are typically smaller and come from more places. Also, in the typical default setup (”^” versions) it’s easy to pull down some very recent source code without realizing it.
I can see how that would feel like you’ve thrown away a whole layer of stability and testing compared to the
lib.somodel.It’s just pedantic really. Use vetted libraries if you care. Git submodule them and reference them locally. Cargo won’t stop you.
I don’t think you get the point.
When using the package manager of my OS to install dependencies for my program I have some safeguards. First of all I see this dependence is important enough for someone to add the package. Also there are some checks done before with the package is added. Changes are also checked before added. Also when security updates are available these are added and I can simple update them, so all running software (even the one not managed by the package manager) get the security fix after reload the dependency. Then there can’t be a maintainer go rough and just break my dependence. I get this also for dependencies of my dependence.
Of course this doesn’t do a full audit of all my dependencies and fix all problems. It just adds help to manage dependencies and trust in them. While with cargo/pip/… you need to do all this on your own for every dependence and every update.
I know it’s a spectrum, but really OS upgrades aren’t a good benchmark. There have been some incredible bugs over the years, often due to OS packaging having or ignoring the defaults of the upstream. For example when Debian screwed up the private key generation…
It is very, very hard to write any serious Rust program without any dependencies from an untrusted code repository such as Cargo. In C and C++, the same is trivial, because almost all dependencies you could ever want are provided by your operating system vendor. The core difference is between trusting some random developer who uploaded some code to Cargo, and trusting your operating system.
I first wanted to write a long comment and decided to just blog it.
The TLDR is somewhere along the lines that I doubt you can get away with only using OS dependencies and making that work is even more work. While not actually gaining as much security as you would like.
As a user, I don’t have much more trust in libfoo packaged by my distro (if it’s packaged) than in libfoo fetched from upstream. And as a distributor, I have much less trust in libfoo packaged by my users’ 50 different OSes than in libfoo I fetch from upstream.
this sums up the dynamic perfectly, and the erosion of trustworthiness to the user in favor of trustworthiness to the developer is glaring.
When you use distro provided libraries, you have to trust thousands of authors. I trust my distro maintainers to not do anything malicious, but I don’t trust them to vet 10000 C packages. Which means I have to trust the authors of the code to not be malicious.
The distro package update process is to pull down new sources, compile, run the test suite (if one is provided), only look at the code if something isn’t working, and then package it. After that, someone might notice if there is something bad going on. But it’s a roll of the die. At no point in this process is anything actually vetted. At most you get some vetting of a few important packages, if your distro is big and serious enough.
Considering how often you find random bespoke implementations of stuff like hash tables in C projects, this is clearly untrue.
I already implicitly trust the software provided by my distro.
Provided by your operating system vendor? Which operating system? If you write cross-platform code, you only get to use the dependencies that all platforms provide (which is pretty much none at all). C and C++ don’t even ship with proper UTF-8 support. It’s completely impossible to write any serious software in these languages without pulling in some external code (or reinventing the wheel). As someone who has earned a living with C/C++ development for years, I have a hard time understanding how you arrived at this conclusion.
The scope of the C/C++ standard library and the Rust standard library is very similar. Rust’s standard library has no time features or random number generators (for that, the semi-official crates
chrono,timeandrandcan be used), but it has great UTF-8 support. Overall, you’ve got about the same power without using any dependencies.My operating system vendor is the Fedora project on my desktop, and Canonical or the Debian project on various servers. All of those provide plenty of C and C++ libraries in their repositories.
I don’t understand why you talk about C++’s UTF-8 support or the size of C++‘s stdlib, that’s completely irrelevant to what I said.
If you use Windows (MSVC), you basically have no third party libraries available whatsoever. You need to download binaries and headers from various websites.
It’s the best example of a third party library you have to pull in for basically every project. This is not provided out of the box or by the operating system vendor.
I don’t use Windows. The person in the article doesn’t either. TFA contains a criticism of Rust from the perspective of a Linux user, not from the perspective of a Windows user.
Okay, we went from:
to:
Yes, dependency management is trivial if you only ever use a system where all your desired dependencies are provided through official channels. However, this is not the experience most C/C++ programmers will have. It’s not even the case if you just use Linux: Recently, I used SDL3 and I had to download the tarball and compile the source myself because it was too new to hit the apt repository of my distro.
The complaints from the article are from the perspective of a Linux user. My post was also from the perspective of a Linux user. Windows is not relevant in this conversation.
Unless you specified it upfront then Windows is going to be presumed to be relevant. Pretending to be surprised that someone thought it wasn’t is silly.
The person you are talking to then goes on to describe a Linux counterexample that is quite common for C/C++ development. If your response to this is welll I don’t do any games/graphics/gui development then cool. You have now defined an extremely narrow category of C/C++ development where it’s common to only use libraries that come already installed. It is no where close to the general case though. Which is the point of the commenter you are responding to.
As for the SDL example: Sure, there are situations where you may need to add dependencies which aren’t in your distro in C or C++. However, this is fairly rare and you can perform a risk assessment every time (where you’d conclude that the risk associated with pulling down SDL3 outside of your distro’s repos is minimal).
Contrast this with Rust (or node.js for that matter), where you’re likely to have so many dependencies not provided by your distro that it’s completely unreasonable to vet every single one of them. For my own Rust projects, the size of my Cargo.lock file is something that worries me, just like the size of the package-lock.json file worries me for my node.js projects. I don’t know most of those dependencies.
I have now specified that I (like the article) were talking in the context of Linux. Do I need to be more clear at this point?
Here’s an update for 2025-01-23.
The OSA continues to claim jusisdiction over the entire web. I asked Ofcom to explain what legal basis there is for a country’s law being able to do so, perhaps an international treaty. A PR flack responded to duck the question and say that the law has the power because the law says it has the power. This was yet another waste of time by Ofcom.
Speaking of which, the new online tool linked in my previous comment is the best argument I can see for convincing someone that these regulations are hopelessly burdensome and vague. I would invite anyone who is not yet convinced of that fact to try using it.
I’m still waiting to hear back from the US Embassy. I think this has been delayed by the change in US presidential administrations appointing new personnel and setting new policy. I’ll continue to follow up.
I’ve gotten pointed to the legislators who were involved in the OSA and I’ll be reaching out to their offices to ask their legislative aides to ask why they think the UK has the jurisdiction to regulate.
I have almost exhausted my options for improving on our least bad plan. When these two inquiries are resolved, I don’t have ideas for other ways of mitigating the threat of the OSA. I still think most likely positive resolution is that UK persons contact their MPs to delay the approval of Ofcom’s regulations until the OSA’s risks to small forums can be reduced, but that’s not something I can advance as a non-citizen. If you are in the UK, please do contact your legislators.
A few well-meaning people have sent me Ofcom’s upcoming Session for Small, Low-Risk Community Sites*, so to write up something I can link:
Why would I attend a session about complying with a UK law, given that I’m not a UK citizen, the site is not hosted in the UK, and I’m not in the UK?
I traded email with Ofcom and asked them to explain their novel legal theory for why the UK can pass a law claiming jurisdiction over the world. They ignored my questions and replied that the law says they have jurisdiction over the world. There’s some more details in my latest update. They’re not operating in good faith.
I don’t need instruction on complying with another country’s censorship regime. I need Ofcom to acknowledge that they don’t have power over other countries, because I’m not interested in wasting years of my life proving in court that the UK doesn’t have the authority to decide what can be published in other countries.
If someone attends this session, please ask:
* It’s not relevant, but as a programmer I have to gripe about variable names. This facially reasonable title is actually OSA legalese. “Small” mostly means “fewer than 7 million visitors physically in the UK”. Lobsters is probably “small” but there isn’t a defined time period and we don’t track demographic data to say authoritatively. “Low-risk” is not defined by the law; it is Ofcom’s invention and their public definition is impenetrable. Lobsters is probably “multi-risk” but hasn’t spent thousands of dollars on legal advice to try to know. But their deliberately poor communication is not the remaining problem here, their lack of authority is.
Expect some patches to that region block. I got blocked already from a german IP where your ip-locator’s website even said it’s from germany.
Ah,
setenvandgetenv. Steam also struggled with this recently. It is safe to say that nobody can use this API properly. In other words, it is broken (even if it behaves as documented, that is not good enough when ‘as documented’ means everyone has crashes because it’s impossible to use properly).Interesting, so it is a pure POSIX/Linux problem and totally fine on macos and windows.
macOS solves the problem by copying the environment on change and leaking the old copy.
This is not what I would be calling totally fine, but more like a barely acceptable kludge
This also only solves the problem if you use
setenv()orputenv(), if you touchenvirondirectly you still bypass the protections.But in either case it is still in a better state than linux.
Unfortunately, this is yet another case where the issue originates from POSIX (MacOS and the BSDs could be affected as well). The standard shows its age, many constructs have issues or even break down when multi-threading is involved. Remember, before C11/C++11, these languages did not even describe a proper multi-threaded memory model (see also the legendary essay Threads cannot be implemented as a Library, published in 2004). Everything from start to finish was just a kludge. Of course, many amendments and mitigations were made to alleviate this problem, but POSIX is holding back innovation, both in operating systems and in the user space. Back then, most programs consisted of one or two simple C files implementing a single-threaded program. POSIX would be radically different if it were designed today.
*nod* This problem has been dogging Rust for about as long as Rust has been stable:
From what I remember from one RFC I saw, part of the issue was that the glibc devs were adopting a “We documented it in the manpage. You’re just holding it wrong.” stance for a while and that delayed both the Rust-side and glibc-side changes mentioned at the end of the post since the Rust devs try to avoid unilaterally hacking around other people’s designs until they’ve run out of other, more cooperative options.
Can’t the Rust side, just… avoid getenv(3) and setenv(3)? One should be able to implement thread safe versions from system calls, can’t we? No need to be unfriendly with the glibc devs if we can avoid depending on them in the first place.
getenv(3)andsetenv(3)aren’t getting the environment from a kernel service, the environment is just a global variableextern char **environthat gets initialized byexecve(2)when starting a process. There’s no “other place” to get the environment from where you will be safe from libc’s predations. Part of the sturm und drang in the historical issues there seems to have been that Rust’sset_envwas “safe” and had a locking regime that made it thread-safe if you only modifiedenvironfrom Rust code; linked-in dependencies not written in Rust would have no idea about what Rust was doing.Crap. I see. One thing bothers me though: why does
setenv(3)even exists? The program’s environment is an input, and if it’s global it should be a global constant. We know global variables are bad since, like, 40 years now? I sense legacy from old school procedural programming, the same kind that decided that a program only needed one parser (old lex & yacc worked over global variables too).Oh well, I guess I have yet another reason to reduce my dependencies to the absolute minimum.
POSIX/Unix and its descendants are a product of legacy.
setenv(3)comes fromV7 Unix4.3BSD, which was neither SMP aware, nor did it have the concept of multiple threads of execution sharing an address space. As mentioned above the environment is just memory that’s written into the process’s address space during process creation.Since processes didn’t even share memory at that point there was no harm in allowing writing to what was already mutable memory - and assumedly it made it easier for programs configured through env vars to manipulate those env vars.
Edit:
getenv(3)comes from V7 Unix.setenv(3)comes from 4.3BSD.Also worth noting that early Unix machines had memory best measured in kilobytes. Even if they had multiple threads of execution, there would have been design pressure to avoid taking multiple copies of the environment.
Early UNIX also had a lot of features in libc (and the kernel) that were really just for shells. The
setenvfunction is in this category. In a shell, you have an exported environment and you want commands to modify it and for child processes to inherit it. Having a single data structure for this is useful, when you invokeexecvein a child process, you just passenvironto it.For any process that is not a shell,
setenvshould never be called. The two cases are entirely separable:I’m going to quote that (heh) Verbatim (oh no) in any future discussions on this topic.
setenv is useful between fork and exec to configure the environment of the child process. Yes, you could use the environment setting variants of exec but often times it is easier to just set it remove a few variables.
Setenv and getenv are not signal async safe, so you cannot use them safely in a fork of a process. If a signal had at the same time modified the environ variable, you might be reading half-valid state.
The proper way, that is valid considering all restrictions in glibc docs, to do it is to copy the environ yourself and modify the new copy in thread local state (the copy is still not async-signal safe). Then you modify it as needed, call fork and then immediately execvpe/execle.
There isn’t a good reason to do processing after a fork, it only leads to hard to diagnose bugs when inevitably you end up messing with non-async-signal-safe state.
Looks like we’re missing a function that wraps
execle(3)orexecvpe(3), where instead of specifying the entire environment, we only specify the value of the environment variables that changed. That way we wouldn’t have to usesetenv(3)the way you suggest.The whole UNIX API process spawning API is based around
fork, call a bunch of things to change the spawned process thenexec. This is pretty clever because it means that you don’t need a super complex spawning API that allows configuring everything and you can do it using real code rather than some plan-old-data which will inevitably grow some sort of rudimentary logic for common issues. So in this environment it makes sense that we don’t even really need a way to set the environment at all withexec, the “UNIX way” isfork,setenvthenexec, no collection ofexecvariants for managing the environment in different ways.However while the
fork, …,execsolution is clever; a better solution would probably be to make all of these functions take a target process argument rather than simplicity targeting the current process so that the spawn process would look something likespawn, do whatever to configure the child with the returned process ID,start. In a scenario like this it makes more sense to pass the environment in to thespawncall and have it immutable. But this also brings up other questions, like are some functions only allowed betweenspawnand the firststart? For opening and closing file descriptors do we need a cross-processclose?From what I’ve seen from looking at their sources, higher-level languages tend to use
posix_spawn(3)/posix_spawnp(3)for their subprocess-launching APIs and switch tofork/execif you ask for something not expressible in it.From glancing at the source: No. Because the thing you actually want to interact with is the pointer containing the ENV which is managed by the unsafe glibc functions and used by the c-programs you’re (eventually always) interacting with. Even if you would get a hold to that, all other c-libs around your program can (and will) still change it, or read it without caring about your additional lock. Famous reason in rust to call into ENV is for example time management.
You could also have c-libs initiating before your rust code has a chance of doing anything, and now they hold getenv-pointers while you try to do anything.
Your code might also not even know that it is calling set/getenv indirectly.
Uh, what? I’m sensing LLM hallucination.
https://en.wikipedia.org/wiki/JavaScript#Trademark
Exactly - Oracle bought Sun in ’09, not ’97
You could argue that it was initiated in ‘09. Or completed in ‘10. ‘97 is so far from a typo relative to either of those that it could practically only be a hallucination. But I think I’ll just flag it “spam”.
Do we need a flag “Hallucination” ?
Eh. Yours would be more accurate. But “spam” works for me. Their claim:
Is so completely outlandish that “spam” makes sense, IMO. There’s no universe I can imagine in which that’s a good-faith statement, anyway. And “spam” seems to cover that particular sort of bad faith pretty well.