"Linus Torvalds Clearly Lays Out Linux Maintainer Roles - Or Not - Around Rust Code" has been merged into this story.
  1. 148
  1.  

    1. 69

      The majority of bugs (quantity, not quality/severity) we have are due to the stupid little corner cases in C that are totally gone in Rust. Things like simple overwrites of memory (not that rust can catch all of these by far), error path cleanups, forgetting to check error values, and use-after-free mistakes. That’s why I’m wanting to see Rust get into the kernel, these types of issues just go away, allowing developers and maintainers more time to focus on the REAL bugs that happen (i.e. logic issues, race conditions, etc.)

      This is an extremely strong statement.

      I think a few things are also interesting:

      1. I think people are realizing how low quality the Linux kernel code is, how haphazard development is, how much burnout and misery is involved, etc.

      2. I think people are realizing how insanely not in the open kernel dev is, how much is private conversations that a few are privy to, how much is politics, etc.

      1. 35

        I think people are realizing how insanely not in the open kernel dev is, how much is private conversations that a few are privy to, how much is politics, etc.

        The Hellwig/Ojeda part of the thread is just frustrating to read because it almost feels like pleading. “We went over this in private” “we discussed this already, why are you bringing it up again?” “Linus said (in private so there’s no record)”, etc., etc.

        1. 45

          Dragging discussions out in front of an audience is a pretty decent tactic for dealing with obstinate maintainers. They don’t like to explain their shoddy reasoning in front of people, and would prefer it remain hidden. It isn’t the first tool in the toolbelt but at a certain point there is no convincing people directly.

          1. 31

            Dragging discussions out in front of an audience is a pretty decent tactic for dealing with

            With quite a few things actually. A friend of mine is contributing to a non-profit, which until recently had this very toxic member (they’ve even attempted felony). They were driven out of the non-profit very soon after members talked in a thread that was accessible to all members. Obscurity is often one key component of abuse, be it mere stubbornness or criminal behaviour. Shine light, and it often goes away.

            1. 13

              IIRC Hintjens noted this quite explicitly as a tactic of bad actors in his works.

              It’s amazing how quickly people are to recognize folks trying to subvert an org piecemeal via one-off private conversations once everybody can compare notes. It’s equally amazing to see how much the same people beforehand will swear up and down oh no that’s a conspiracy theory such things can’t happen here until they’ve been burned at least once.

              This is an active, unpatched attack vector in most communities.

              1. 12

                I’ve found the lowest example of this is even meetings minutes at work. I’ve observed that people tend to act more collaboratively and seek the common good if there are public minutes, as opposed to trying to “privately” win people over to their desires.

            2. 5

              There is something to be said for keeping things between people with skin in the game.

              It’s flipped over here, though, because more people want to contribute. The question is whether it’ll be stabe long-term.

            3. 18

              I think people are realizing how low quality the Linux kernel code is, how haphazard development is, how much burnout and misery is involved, etc.

              Something I’ve noticed is true in virtually everything I’ve looked deeply at is the majority of work is poor to mediocre and most people are not especially great at their jobs. So it wouldn’t surprise me if Linux is the same. (…and also wouldn’t surprise me if the wonderful Rust rewrite also ends up poor to mediocre.)

              yet at the same time, another thing that astonishes me is how much stuff actually does get done and how well things manage to work anyway. And Linux also does a lot and works pretty well. Mediocre over the years can end up pretty good.

              1. 14

                After tangentially following the kernel news, I think a lot of churning and death spiraling is happening. I would much rather have a rust-first kernel that isn’t crippled by the old guard of C developers reluctant to adopt new tech.

                Take all of this energy into RedoxOS and let Linux stay in antiquity.

                1. 36

                  I’ve seen some of the R4L people talk on Mastodon, and they all seem to hate this argument.

                  They want to contribute to Linux because they use it, want to use it, and want to improve the lives of everyone who uses it. The fact that it’s out there and deployed and not a toy is a huge part of the reason why they want to improve it.

                  Hopping off into their own little projects which may or may not be useful to someone in 5-10 years’ time is not interesting to them. If it was, they’d already be working on Redox.

                  1. 2

                    The most effective thing that could happen is for the Linux foundation, and Linus himself, to formally endorse and run a Rust-based kernel. They can adopt an existing one or make a concerted effort to replace large chunks of Linux’s C with Rust.

                    IMO the Linux project needs to figure out something pretty quickly because it seems to be bleeding maintainers and Linus isn’t getting any younger.

                    1. 0

                      They may be misunderstanding the idea that others are not necessarily incentivized to do things just because it’s interesting for them (the Mastodon posters).

                    2. 4

                      Yep, I made a similar remark upthread. A Rust-first kernel would have a lot of benefits over Linux, assuming a competent group of maintainers.

                      1. 4

                        along similar lines: https://drewdevault.com/2024/08/30/2024-08-30-Rust-in-Linux-revisited.html

                        Redox does have the chains of trying to do new OS things. An ABI-compatible Rust rewrite of the Linux kernel might get further along than expected (even if it only runs in virtual contexts, without hardware support (that would come later.))

                        1. 44

                          Linux developers want to work on Linux, they don’t want to make a new OS. Linux is incredibly important, and companies already have Rust-only drivers for their hardware.

                          Basically, sure, a new OS project would be neat, but it’s really just completely off topic in the sense that it’s not a solution for Rust for Linux. Because the “Linux” part in that matters.

                          1. 19

                            I read a 25+ year old article [1] from a former Netscape developer that I think applies in part

                            The idea that new code is better than old is patently absurd. Old code has been used. It has been tested. Lots of bugs have been found, and they’ve been fixed. There’s nothing wrong with it. It doesn’t acquire bugs just by sitting around on your hard drive. Au contraire, baby! Is software supposed to be like an old Dodge Dart, that rusts just sitting in the garage? Is software like a teddy bear that’s kind of gross if it’s not made out of all new material?

                            Adopting a “rust-first” kernel is throwing the baby out with the bathwater. Linux has been beaten into submission for over 30 years for a reason. It’s the largest collaborative project in human history and over 30 million lines of code. Throwing it out and starting new would be an absolutely herculean effort that would likely take years, if it ever got off the ground.

                            [1] https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/

                            1. 33

                              The idea that old code is better than new code is patently absurd. Old code has stagnated. It was built using substandard, out of date methodologies. No one remembers what’s a bug and what’s a feature, and everyone is too scared to fix anything because of it. It doesn’t acquire new bugs because no one is willing to work on that weird ass bespoke shit you did with your C preprocessor. Au contraire, baby! Is software supposed to never learn? Are we never to adopt new tools? Can we never look at something we’ve built in an old way and wonder if new methodologies would produce something better?

                              This is what it looks like to say nothing, to beg the question. Numerous empirical claims, where is the justification?

                              It’s also self defeating on its face. I take an old codebase, I fix a bug, the codebase is now new. Which one is better?

                              1. 16

                                Like most things in life the truth is somewhere in the middle. There is a reason there is the concept of a “mature node” in the semiconductor industry. They accept that new is needed for each node, but also that the new thing takes time to iron out the kinks and bugs. This is the primary reason why you see apple take new nodes on first before Nvidia for example, as Nvidia require much larger die sizes, and so less defects per square mm.

                                You can see this sometimes in software for example X11 vs Wayland, where adoption is slow, but most definetly progressing and now-days most people can see that Wayland is now, or is going to become the dominant tech in the space.

                                1. 16

                                  The truth lies where it lies. Maybe the middle, maybe elsewhere. I just don’t think we’ll get to the truth with rhetoric.

                                  1. 5

                                    Aren’t the arguments above more dialectic than rhetoric?

                                    1. 7

                                      I don’t think this would qualify as dialectic, it lacks any internal debate and it leans heavily on appeals by analogy and intuition/ emotion. The post itself makes a ton of empirical claims without justification even beyond the quoted bit.

                                      1. 1

                                        fair enough, I can see how one would make that argument.

                                2. 15

                                  “Good” is subjective, but there is real evidence that older code does contain fewer vulnerabilities: https://www.usenix.org/conference/usenixsecurity22/presentation/alexopoulos

                                  That means we can probably keep a lot of the old trusty Linux code around while making more of the new code safe by writing it in Rust in the first place.

                                  1. 10

                                    I don’t think that’s a fair assessment of Spolsky’s argument or of CursedSilicon’s application of it to the Linux kernel.

                                    Firstly, someone has already pointed out the research that suggests that existing code has fewer bugs in than new code (and that the older code is, the less likely it is to be buggy).

                                    Secondly, this discussion is mainly around entire codebases, not just existing code. Codebases usually have an entire infrastructure around them for verifying that the behaviour of the codebase has not changed. This is often made up of tests, but it’s also made up of the users who try out a release of a codebase and determine whether it’s working for them. The difference between making a change to an existing codebase and releasing a new project largely comes down to whether this verification (both in terms of automated tests and in terms of users’ ability to use the new release) works for the new code.

                                    Given this difference, if I want to (say) write a new OS completely in Rust, I need to choose: Do I want to make it completely compatible with Linux, and therefore take on the significant challenge of making sure everything behaves truly the same? Or do I make significant breaking changes, write my own OS, and therefore force potential adopters to rebuild their entire Linux workflows in my new OS?

                                    The point is not that either of these options are bad, it is that they represent significant risks to a project. Added to the general risk that is writing new code, this produces a total level of risk that might be considered the baseline risk of doing a rewrite. Now risk is not bad per se! If the benefits of being able to write an OS in a language like Rust outweigh the potential risks, then it still makes sense to perform the rewrite. Or maybe the existing Linux kernel is so difficult to maintain that a new codebase really would be the better option. But the point that CursedSilicon was making by linking the Spolsky piece was, I believe, that the risks for a project like the Linux kernel are very high. There is a lot of existing, old code. And there is a very large ecosystem where either breaking or maintaining compatibility would each come with significant challenges.

                                    Unfortunately, it’s very difficult to measure the risks and benefits here in a quantitative, comparable way, so I think where you fall on the “rewrite vs continuity” spectrum will depend mostly on what sort of examples you’ve seen, and how close you think this case is to those examples. I don’t think there’s any objective way to say whether it makes more sense to have something like R4L, or something like RedoxOS.

                                    1. 7

                                      Firstly, someone has already pointed out the research that suggests that existing code has fewer bugs in than new code (and that the older code is, the less likely it is to be buggy).

                                      I haven’t read it yet, but I haven’t made an argument about that, I just created a parody of the argument as presented. I’ll be candid, i doubt that the research is going to compel me to believe that newer code is inherently buggier, it may compel me to confirm my existing belief that testing software in the field is one good method to find some classes of bugs.

                                      Secondly, this discussion is mainly around entire codebases, not just existing code.

                                      I guess so, it’s a bit dependent on where we say the discussion starts - three things are relevant; RFL, which is not a wholesale rewrite, a wholesale rewrite of the Linux kernel, and Netscape. RFL is not about replacing the entire Linux kernel, although perhaps “codebase” here refers to some sort of unit, like a driver. Netscape wanted a wholesale rewrite, based on the linked post, so perhaps that’s what’s really “the single worst strategic mistake that any software company can make”, but I wonder what the boundary here is? Also, the article immediately mentions that Microsoft tried to do this with Word but it failed, but that Word didn’t suffer from this because it was still actively developed - I wonder if it really “failed” just because pyramid didn’t become the new Word? Did Microsoft have some lessons learned, or incorporate some of that code? Dunno.

                                      I think I’m really entirely justified when I say that the post is entirely emotional/ intuitive appeals, rhetoric, and that it makes empirical claims without justification.

                                      There’s a subtle reason that programmers always want to throw away the code and start over. The reason is that they think the old code is a mess. And here is the interesting observation: they are probably wrong. The reason that they think the old code is a mess is because of a cardinal, fundamental law of programming:

                                      This is rhetoric. These are unsubstantiated empirical claims. The article is all of this. It’s fine as an interesting, thought provoking read that gets to the root of our intuitions, but I think anyone can dismiss it pretty easily since it doesn’t really provide much in the form of an argument.

                                      It’s important to remember that when you start from scratch there is absolutely no reason to believe that you are going to do a better job than you did the first time.

                                      Again, totally unsubstantiated. I have MANY reasons to believe that, it is simply question begging to say otherwise.

                                      That’s all this post is. Over and over again making empirical claims with no evidence and question beggign.

                                      We can discuss the risks and benefits, I’d advocate for that. This article posted doesn’t advocate for that. It’s rhetoric.

                                      1. 11

                                        existing code has fewer bugs in than new code (and that the older code is, the less likely it is to be buggy).

                                        This is a truism. It is survival bias. If the code was buggy, it would eventually be found and fixed. So all things being equal newer code is riskier than old code. But it’s also been impirically shown that using Rust for new code is not “all things being equal”. Google showed that new code in Rust is as reliable as old code in C. Which is good news: you can use old C code from new Rust projects without the risk that comes from new C code.

                                        1. 5

                                          But it’s also been impirically shown that using Rust for new code is not “all things being equal”.

                                          Yeah, this is what I’ve been saying (not sure if you’d meant to respond to me or the parent, since we agree) - the issue isn’t “new” vs “old” it’s things like “reviewed vs unreviewed” or “released vs unreleased” or “tested well vs not tested well” or “class of bugs is trivial to express vs class of bugs is difficult to express” etc.

                                          1. 6

                                            Was restating your thesis in the hopes of making it clearer.

                                          2. 2

                                            I don’t disagree that the rewards can outweigh the risks, and in this case I think there’s a lot of evidence that suggests that memory safety as a default is really important for all sorts of reasons. Let alone the many other PL developments that make Rust a much more suitable language to develop in than C.

                                            That doesn’t mean the risks don’t exist, though.

                                      2. 4

                                        It’s also self defeating on its face. I take an old codebase, I fix a bug, the codebase is now new. Which one is better?

                                        Nobody would call an old codebase with a handful of fixes a new codebase, at least not in the contexts in which those terms have been used here.

                                        1. 1

                                          How many lines then?

                                          1. 6

                                            It’s a Ship of Theseus—at no point can you call it a “new” codebase, but after a period of time, it could be completely different code. I have a C program I’ve been using and modifying for 25 years. At any given point, it would have been hard to say “this is now a new codebase, yet not one line of code in the project is the same as when I started (even though it does the same thing at it always has).

                                            1. 4

                                              I don’t see the point in your question. It’s going to depend on the codebase, and on the nature of the changes; it’s going to be nuanced, and subjective at least to some degree. But the fact that it’s prone to subjectivity doesn’t mean that you get to call an old codebase with a single fixed bug a new codebase, without some heavy qualification which was lacking.

                                              1. 1

                                                If it requires all of that nuance and context maybe the issue isn’t what’s “old” and what’s “new”.

                                                1. 3

                                                  I don’t follow, to me that seems like a non-sequitur.

                                                  1. 4

                                                    What’s old and new is poorly defined and yet there’s an argument being made that “old” and “new” are good indicators of something. If they’re so poorly defined that we have to bring in all sorts of additional context like the nature of the changes, not just when they happened or the number of lines changed, etc, then it seems to me that we would be just as well served to throw away the “old” and “new” and focus on that context.

                                                    1. 2

                                                      I feel like enough people would agree more-or-less on what was an “old” or “new” codebase (i.e. they would agree given particular context) that they remain useful terms in a discussion. The general context used here is apparent (at least to me) given by the discussion so far: an older codebase has been around for a while, has been maintained, has had kinks ironed out.

                                                      1. 3

                                                        There’s a really important distinction here though. The point is to argue that new projects will be less stable than old ones, but you’re intuitively (and correctly) bringing in far more important context - maintenance, testing, battle testing, etc. If a new implementation has a higher degree of those properties then it being “new” stops being relevant.

                                                        1. 2

                                                          Ok, but:

                                                          It’s also self defeating on its face. I take an old codebase, I fix a bug, the codebase is now new. Which one is better?

                                                          My point was that this statement requires a definition of “new codebase” that nobody would agree with, at least in the context of the discussion we’re in. Maybe you are attacking the base proposition without applying the surrounding context, which might be valid if this were a formal argument and not a free-for-all discussion.

                                                          If a new implementation has a higher degree of those properties

                                                          I think that it would be considered no longer new if it had had significant battle-testing, for example.

                                                          FWIW the important thing in my view is that every new codebase is a potential old codebase (given time and care), and a rewrite necessarily involves a step backwards. The question should probably not be, which is immediately better?, but, which is better in the longer term (and by how much)? However your point that “new codebase” is not automatically worse is certainly valid. There are other factors than age and “time in the field” that determine quality.

                                          2. 1

                                            Methodologies don’t matter for quality of code. They could be useful for estimates, cost control, figuring out whom you shall fire etc. But not for the quality of code.

                                            1. 4

                                              You’re suggesting that the way you approach programming has no bearing on the quality of the produced program?

                                              1. 3

                                                I’ve never observed a programmer become better or worse by switching methodology. Dijkstra would’ve not became better if you made him do daily standups or go through code reviews.

                                                There are ways to improve your programming by choosing different approach but these are very individual. Methodology is mostly a beancounting tool.

                                                1. 3

                                                  When I say “methodology” I’m speaking very broadly - simply “the approach one takes”. This isn’t necessarily saying that any methodology is better than any other. The way I approach a task today is better, I think, then the way that I would have approached that task a decade ago - my methodology has changed, the way I think has changed. Perhaps that might mean I write more tests, or I test earlier, but it may mean exactly the opposite, and my methods may only work best for me.

                                                  I’m not advocating for “process” or ubiquity, only that the approach one tasks may improve over time, which I suspect we would agree on.

                                          3. 28

                                            If you take this logic to its end, you should never create new things.

                                            At one point in time, Linux was also the new kid on the block.

                                            The best time to plant a tree is 30 years ago. The second best time is now.

                                            1. 7

                                              I read a 25+ year old article [1] from a former Netscape developer that I think applies in part

                                              I don’t think Joel Spolsky was ever a Netscape developer. He was a Microsoft developer who worked on Excel.

                                              1. 2

                                                My mistake! The article contained a bit about Netscape and I misremembered it

                                              2. 5

                                                It’s the largest collaborative project in human history and over 30 million lines of code.

                                                How many of those lines are part of the core? My understanding was that the overwhelming majority was driver code. There may not be that much core subsystem code to rewrite.

                                                1. 5

                                                  For a previous project, we included a minimal Linux build. It was around 300 KLoC, which included networking and the storage stack, along with virtio drivers.

                                                  That’s around the size a single person could manage and quite easy with a motivated team.

                                                  If you started with DPDK and SPDK then you’d already have filesystems and a copy of the FreeBSD network stack to run in isolated environments.

                                                  1. 2

                                                    Once many drivers share common rust wrappers over core subsystems, you could flip it and write the subsystem in Rust. Then expose C interface for the rest.

                                                    1. 3

                                                      Oh sure, that would be my plan as well. And I bet some subsystem maintainers see this coming, and resist it for reasons that aren’t entirely selfless.

                                                      1. 3

                                                        That’s pretty far into the future, both from a maintainer acceptance PoV and from a rustc_codegen_gcc and/or gccrs maturity PoV.

                                                        1. 4

                                                          Sure. But I doubt I’ll running a different kernel 10y from now.

                                                          And like us, those maintainers are not getting any younger and if they need a hand, I am confident I’ll get faster into it with a strict type checker.

                                                          I am also confident nobody in our office would be able to help out with C at all.

                                                    2. 4

                                                      It’s the largest collaborative project in human history

                                                      This cannot possibly be true.

                                                      1. 5

                                                        It’s the largest collaborative project in human history

                                                        It’s the largest collaborative open source os kernel project in human history

                                                        1. 4

                                                          It’s been described as such based purely on the number of unique human contributions to it

                                                          1. 12

                                                            I would expect Wikipedia should be bigger 🤔

                                                      2. 7

                                                        I see that Drew proposes a new OS in that linked article, but I think a better proposal in the same vein is a fork. You get to keep Linux, but you can start porting logic to Rust unimpeded, and it’s a manageable amount of work to keep porting upstream changes.

                                                        Remember when libav forked from ffmpeg? Michael Niedermayer single-handedly ported every single libav commit back into ffmpeg, and eventually, ffmpeg won.

                                                        At first there will be extremely high C percentage, low Rust percentage, so porting is trivial, just git merge and there will be no conflicts. As the fork ports more and more C code to Rust, however, you start to have to do porting work by inspecting the C code and determining whether the fixes apply to the corresponding Rust code. However, at that point, it means you should start seeing productivity gains, community gains, and feature gains from using a better language than C. At this point the community growth should be able to keep up with the extra porting work required. And this is when distros will start sniffing around, at first offering variants of the distro that uses the forked kernel, and if they like what they taste, they might even drop the original.

                                                        I genuinely think it’s a strong idea, given the momentum and potential amount of labor Rust community has at its disposal.

                                                        I think the competition would be great, especially in the domain of making it more contributor friendly to improve the kernel(s) that we use daily.

                                                        1. 15

                                                          I certainly don’t think this is impossible, for sure. But the point ultimately still stands: Linux kernel devs don’t want a fork. They want Linux. These folks aren’t interested in competing, they’re interested in making the project they work on better. We’ll see if some others choose the fork route, but it’s still ultimately not the point of this project.

                                                        2. 5

                                                          Linux developers want to work on Linux, they don’t want to make a new OS.

                                                          While I don’t personally want to make a new OS, I’m not sure I actually want to work on Linux. Most of the time I strive for portability, and so abstract myself from the OS whenever I can get away with it. And when I can’t, I have to say Linux’s API isn’t always that great, compared to what the BSDs have to offer (epoll vs kqueue comes to mind). Most annoying though is the lack of documentation for the less used APIs: I’ve recently worked with Netlink sockets, and for the proc stuff so far the best documentation I found was the freaking source code of a third party monitoring program.

                                                          I was shocked. Complete documentation of the public API is the minimum bar for a project as serious of the Linux kernel. I can live with an API I don’t like, but lack of documentation is a deal breaker.

                                                          1. 10

                                                            While I don’t personally want to make a new OS, I’m not sure I actually want to work on Linux.

                                                            I think they mean that Linux kernel devs want to work on the Linux kernel. Most (all?) R4L devs are long time Linux kernel devs. Though, maybe some of the people resigning over LKML toxicity will go work on Redox or something…

                                                            1. 3

                                                              That’s is what I was saying, yes.

                                                            2. 5

                                                              I’m talking about the people who develop the Linux kernel, not people who write userland programs for Linux.

                                                          2. 2

                                                            Re-Implementing the kernel ABI would be a ton of work for little gain if all they wanted was to upstream all the work on new hardware drivers that is already done - and then eventually start re-implementing bits that need to be revised anyway.

                                                        3. 3

                                                          If the singular required Rust toolchain didn’t feel like such a ridiculous to bootstrap 500 ton LLVM clown car I would agree with this statement without reservation.

                                                          1. 1

                                                            Would zig be a better starting place?

                                                            1. 4

                                                              Zig is easier to implement (and I personally like it as a language) but doesn’t have the same safety guarantees and strong type system that Rust does. It’s a give and take. I actually really like Rust and would like to see a proliferation of toolchain options, such as what’s in progress in GCC land. Overall, it would just be really nice to have an easily bootstrapped toolchain that a normal person can compile from scratch locally, although I don’t think it necessarily needs to be the default, or that using LLVM generally is an issue. However, it might be possible that no matter how you architect it, Rust might just be complicated enough that any sufficiently useful toolchain for the language could just end up being a 500 ton clown car of some kind anyways.

                                                              1. 2

                                                                Depends on which parts of GP’s statement you care about: LLVM or bootstrap. Zig is still depending on LLVM (for now), but it is no longer bootstrappable in a limited number of steps (because they switched from a bootstrap C++ implementation of the compiler to keeping a compressed WASM build of the compiler as a blob.

                                                                1. 2

                                                                  Yep, although I would also add it’s unfair to judge Zig in any case on this matter now given it’s such a young project that clearly is going to evolve a lot before the dust begins to settle (Rust is also young, but not nearly as young as Zig). In ten to twenty years, so long as we’re all still typing away on our keyboards, we might have a dozen Zig 1.0 and a half dozen Zig 2.0 implementations!

                                                          2. 6

                                                            Yeah, the absurdly low code quality and toxic environment make me think that Linux is ripe for disruption. Not like anyone can produce a production kernel overnight, but maybe a few years of sustained work might see a functional, production-ready Rust kernel for some niche applications and from there it could be expanded gradually. While it would have a lot of catching up to do with respect to Linux, I would expect it to mature much faster because of Rust, because of a lack of cruft/backwards-compatibility promises, and most importantly because it could avoid the pointless drama and toxicity that burn people out and prevent people from contributing in the first place.

                                                            1. 14

                                                              the absurdly low code quality

                                                              What is the, some kind of a new meme? Where did you hear it first?

                                                              1. 22

                                                                From the thread in OP, if you expand the messages, there is wide agreement among the maintainers that all sorts of really badly designed and almost impossible to use (safely) APIs ended up in the kernel over the years because the developers were inexperienced and kind of learning kernel development as they went. In retrospect they would have designed many of the APIs very differently.

                                                                1. 4

                                                                  Someone should compile everything to help future OS developers avoid those traps! There are a lot of exieting non-posix experiments though.

                                                                2. 14

                                                                  It’s based on my forays into the Linux kernel source code. I don’t doubt there’s some quality code lurking around somewhere, but the stuff I’ve come across (largely filesystem and filesystem adjacent) is baffling.

                                                                  1. 7

                                                                    Seeing how many people are confidently incorrect about Linux maintainers only caring about their job security and keeping code bad to make it a barrier to entry, if nothing else taught me how online discussions are a huge game of Chinese whispers where most participants don’t have a clue of what they are talking about.

                                                                    1. 15

                                                                      I doubt that maintainers are “only caring about their job security and keeping back code” but with all due respect: You’re also just taking arguments out of thin air right now. What I do believe is what we have seen: Pretty toxic responses from some people and a whole lot of issues trying to move forward.

                                                                      1. 8

                                                                        Seeing how many people are confidently incorrect about Linux maintainers only caring about their job security and keeping code bad to make it a barrier to entry

                                                                        Huh, I’m not seeing any claim to this end from the GP, or did I not look hard enough? At face value, saying that something has an “absurdly low code quality” does not imply anything about nefarious motives.

                                                                        1. 10

                                                                          I can personally attest to having never made that specific claim.

                                                                          1. 7

                                                                            Indeed that remark wasn’t directly referring to GP’s comment, but rather to the range of confidently incorrect comments that I read in the previous episodes, and to the “gatekeeping greybeards” theme that can be seen elsewhere on this page. First occurrence, found just by searching for “old”: Linux is apparently “crippled by the old guard of C developers reluctant to adopt new tech”, to which GP replied in agreement in fact. Another one, maintainers don’t want to “do the hard work”.

                                                                            Still, in GP’s case the Chinese whispers have reduced “the safety of this API is hard to formalize and you pretty much have to use it the way everybody does it” to “absurdly low quality”. To which I ask, what is more likely. 1) That 30-million lines of code contain various levels of technical debt of which maintainers are aware; and that said maintainers are worried even of code where the technical debt is real but not causing substantial issue in practice? Or 2) that a piece of software gets to run on literally billions of devices of all sizes and prices just because it’s free and in spite of its “absurdly low quality”?

                                                                            Linux is not perfect, neither technically nor socially. But it sure takes a lot of entitlement and self-righteousness to declare it “of absurdly low quality” with a straight face.

                                                                            1. 11

                                                                              GP here: I probably should have said “shockingly” rather than “absurdly”. I didn’t really expect to get lawyered over that one word, but yeah, the idea was that for a software that runs on billions of devices, the code quality is shockingly low.

                                                                              Of course, this is plainly subjective. If your code quality standards are a lot lower than mine then you might disagree with my assessment.

                                                                              That said, I suspect adoption is a poor proxy for code quality. Internet Explorer was widely adopted and yet it’s broadly understood to have been poorly written.

                                                                              But it sure takes a lot of entitlement and self-righteousness to declare it “of absurdly low quality” with a straight face

                                                                              I’m sure self-righteousness could get you to the same place, but in my case I arrived by way of experience. You can relax, I wasn’t attacking Linux—I like Linux—it just has a lot of opportunity for improvement.

                                                                              1. 5

                                                                                I guess I’ve seen the internals of too much proprietary software now to be shocked by anything about Linux per se. I might even argue that the quality of Linux is surprisingly good, considering its origins and development model.

                                                                                I think I’d lawyer you a tiny bit differently: some of the bugs in the kernel shock me when I consider how many devices run that code and fulfill their purposes despite those bugs.

                                                                                1. 7

                                                                                  FWIW, I was not making a dig at open source software, and yes plenty of corporate software is worse. I guess my expectations for Linux are higher because of how often it is touted as exemplary in some form or another. I don’t even dislike Linux, I think it’s the best thing out there for a huge swath of use cases—I just see some pretty big opportunities for improvement.

                                                                              2. 4

                                                                                But it sure takes a lot of entitlement and self-righteousness to declare it “of absurdly low quality” with a straight face.

                                                                                Or actual benchmarks: the performance the Linux kernel leaves on the table in some cases is absurd. And sure it’s just one example, but I wouldn’t be surprised if it was representative of a good portion of the kernel.

                                                                                1. 3

                                                                                  absurdly low quality

                                                                                  Well not quite but still “considered broken beyond repair by many people related to life time management” - which is definitely worse than “hard to formalize” when “the way ever[y]body does it” seems to vary between each user.

                                                                                  1. 4

                                                                                    I love Rust but still, we’re talking of a language which (for good reasons!) considers doubly linked lists unsafe. Take an API that gets a 4 on Rusty Russell’s API design scale (“Follow common convention and you’ll get it right”), but which was designed for a completely different programming language if not paradigm, and it’s not surprising that it can’t easily be transformed into a 9 (“The compiler/linker won’t let you get it wrong”). But at the same time there are a dozen ways in which, according to the same scale, things could actually be worse!

                                                                                    What I dislike is that people are seeing “awareness of complexity” and the message they spread is “absurdly low quality”.

                                                                                    1. 13

                                                                                      Note that doubly linked lists are not a special case at all in Rust. All the other common data structures like Vec, HashMap etc. also need unsafe code in their implementation.

                                                                                      Implementing these datastructures in Rust, and writing unsafe code in general, is indeed roughly a 4. But these are all already implemented in the standard library, with an API that actually is at a 9. And std::collections::LinkedList is constructive proof that you can have a safe Rust abstraction for doubly linked lists.

                                                                                      Yes, the implementation could have bugs, thus making the abstraction leaky. But that’s the case for literally everything, down to the hardware that your code runs on.

                                                                                      1. 4

                                                                                        You’re absolutely right that you can build abstractions with enough effort.

                                                                                        My point is that if a doubly linked list is (again, for good reasons) hard to make into a 9, a 20-year-old API may very well be even harder. In fact, std::collections::LinkedList is safe but still not great (for example the cursor API is still unstable); and being in std, it was designed/reviewed by some of the most knowledgeable Rust developers, sort of by definition. That’s the conundrum that maintainers face and, if they realize that, it’s a good thing. I would be scared if maintainers handwaved that away.

                                                                                        Yes, the implementation could have bugs, thus making the abstraction leaky.

                                                                                        Bugs happen, but if the abstraction is downright wrong then that’s something I wouldn’t underestimate. A lot of the appeal of Rust in Linux lies exactly in documenting/formalizing these unwritten rules, and wrong documentation can be worse than no documentation (cue the negative parts of the API design scale!); even more so if your documentation is a formal model like a set of Rust types and functions.

                                                                                        That said, the same thing can happen in a Rust-first kernel, which will also have a lot of unsafe code. And it would be much harder to fix it in a Rust-first kernel, than in Linux at a time when it’s just feeling the waters.

                                                                                        1. 7

                                                                                          In fact, std::collections::LinkedList is safe but still not great (for example the cursor API is still unstable); and being in std, it was designed/reviewed by some of the most knowledgeable Rust developers, sort of by definition.

                                                                                          At the same time, it was included almost as like, half a joke, and nobody uses it, so there’s not a lot of pressure to actually finish off the cursor API.

                                                                                          It’s also not the kind of linked list the kernel would use, as they’d want an intrusive one.

                                                                                      2. 12

                                                                                        And yet, safe to use doubly linked lists written in Rust exist. That the implementation needs unsafe is not a real problem. That’s how we should look at wrapping C code in safe Rust abstractions.

                                                                                        1. 3

                                                                                          The whole comment you replied to, after the one sentence about linked lists, is about abstractions. And abstractions are rarely going to be easy, and sometimes could be hardly possible.

                                                                                          That’s just a fact. Confusing this fact for something as hyperbolic as “absurdly low quality” is stunning example of the Dunning Kruger effect, and frankly insulting as well.

                                                                                          1. 9

                                                                                            I personally would call Linux low quality because many parts of it are buggy as sin. My GPU stops working properly literally every other time I upgrade Linux.

                                                                                            No one is saying that Linux is low quality because it’s hard or impossible to abstract some subsystems in Rust, they’re saying it’s low quality because a lot of it barely works! I would say that your “Chinese whispers” misrepresents the situation and what people here are actually saying. “the safety of this API is hard to formalize and you pretty much have to use it the way everybody does it” doesn’t apply if no one can tell you how to use an API, and everyone does it differently.

                                                                                            1. 3

                                                                                              I agree, Linux is the worst of all kernels.

                                                                                              Except for all the others.

                                                                                              1. 9

                                                                                                Actually, the NT kernel of all things seems to have a pretty good reputation, and I wouldn’t dismiss the BSD kernels out of hand. I don’t know which kernel is better, but it seems you do. If you could explain how you came to this conclusion that would be most helpful.

                                                                                                1. 10

                                                                                                  NT gets a bad rap because of the OS on top of it, not because it’s actually bad. NT itself is a very well-designed kernel.

                                                                                                  1. 3

                                                                                                    *nod* I haven’t been a Windows person since shortly after the release of Windows XP (i.e. the first online activation DRM’d Windows) but, whenever I see glimpses of what’s going on inside the NT kernel in places like Project Zero: The Definitive Guide on Win32 to NT Path Conversion, it really makes me want to know more.

                                                                                2. [Comment removed by author]

                                                                              3. 4

                                                                                More likely a fork that gets rusted from the inside out

                                                                              4. -1

                                                                                how low quality the Linux kernel code is

                                                                                Somewhere else it was mentioned that most developers in the kernel could just not be bothered with checking for basic things.

                                                                                how much burnout and misery is involved

                                                                                Nobody is forcing any of these people to do this.

                                                                              5. 33

                                                                                I found the first reply on LKML to be very interesting.

                                                                                To quote:

                                                                                for lots of the in-kernel APIs, compile-time constraints enforcement to prevent misuse doesn’t matter, because those APIs don’t provide any way to be used safely. Looking at the two subsystems I know the best, V4L2 and DRM, handling the life time of objects safely in drivers isn’t just hard, it’s often plain impossible

                                                                                And

                                                                                in order to provide API that are possible to use correctly, we have many areas deep in kernel code that will require a complete redesign [..] I would be very surprised if I was working in the only area in the kernel that is considered broken beyond repair by many people related to life time management

                                                                                Which feels to me like there is a strong chicken and egg problem: To actually add any rust bindings for certain kernel parts you would need to first rewrite them, because there is apparently no actual defined way to call them safely.

                                                                                Which means it’s not about adding rust, it’s about rust being the reason to poke where it hurts. Potentially requiring a rewrite of hundreds of thousands LOC to even start seeing any benefits. In a state where I wouldn’t blame any maintainer that told me they don’t actually know how that part of the code truly works.

                                                                                1. 29

                                                                                  Yeah. Part of the drama has been the R4L folks trying to get subsystem maintainers in these areas to document the “right ways” to use the APIs so the Rust API can incorporate those rules, and some maintainers saying “just do it like that other filesystem and stop harassing us, you said you’d do all the work”. (At least that’s how they’re perceived.) But it’s not like they would let the R4L folks go in and rewrite that stuff, either.

                                                                                  1. 43

                                                                                    I recall Asahi Lina’s comments on drm_sched. Choice quotes:

                                                                                    But the scheduler also keeps track of jobs, which reference their completion fences, so we have a lifetime loop. That loop is broken at certain points in the job lifecycle, but the fact it exists makes it very difficult to reason about the lifetimes of any of this stuff, and also makes it impossible to implement the requirements imposed by drm_sched via straight refcounting. If you try to refcount the scheduler and have the hw fence hold a reference to it, then the whole thing deadlocks, because the job completion fence might have its final reference dropped by the scheduler itself (when a job is cleaned up after completion), which would lead to trying to free the scheduler from the scheduler workqueue itself.

                                                                                    So now your driver needs to implement some kind of deferred cleanup workqueue to free schedulers possibly forever in the future. And also your driver module might be blocked from unloading from the kernel forever, because if any buffers hold on to job completion fences, that means your driver can’t unload due to the dependency.

                                                                                    I fixed it so that tearing down the scheduler gracefully aborts all jobs and detaches the hardware callbacks (it can’t abort the underlying hardware jobs, but it can decouple them from the scheduler side). In my driver’s case, that all works beautifully because my driver internals are basically reference counted everywhere, so while the scheduler and high-level queue can be destroyed, any currently running jobs continue to run to completion or failure and their underlying driver resources get cleaned up then, asynchronously.

                                                                                    The maintainer rejected the patch, and said it was the driver’s job to ensure that the scheduler outlives job execution.

                                                                                    But the scheduler owns the jobs lifetime-wise after you submit them, so how would that work? It doesn’t. If you try to introduce a job->scheduler reference, you’re creating a loop again, and the scheduler deadlocks when it frees a job and tries to tear itself down from within.

                                                                                    So now we’re back at having to introduce an asynchronous cleanup workqueue or similar, just to deal with the DRM scheduler’s incredibly poor lifetime design choices.

                                                                                    If I remember correctly, most C drivers that use drm_sched do not get this right, but it doesn’t come up much because most people aren’t trying to shut down their GPUs other than when they’re shutting off their computers, unless they’re using an eGPU (and eGPUs are notoriously semi-broken on Linux). Lina’s M1 GPU driver uses a scheduler per GPU context (/per application), hence schedulers are torn down whenever graphical applications are closed, so her driver couldn’t just ignore the complexity like most other drivers appear to do.

                                                                                  2. 20

                                                                                    Those statements just come across to me as “we built something unmaintainable and now I don’t want to maintain it”, i.e., a way to avoid doing the hard work.

                                                                                    https://lore.kernel.org/rust-for-linux/Z7SwcnUzjZYfuJ4-@infradead.org/

                                                                                    So we’ll have these bindings creep everywhere like a cancer and are very quickly moving from a software project that allows for and strives for global changes that improve the overall project to increasing compartmentalization [2].

                                                                                    Because the cancer metaphor worked so well for Hellwig the last time he used it…

                                                                                    1. 3

                                                                                      I wouldn’t blame anyone for that. The road to hell is paved with good intentions. And most of the people maintaining it now probably didn’t start it.

                                                                                      1. 16

                                                                                        If they’re a paid maintainer, then it’s their job to do just that. Hellwig is a guy who has explicitly said he doesn’t want any other maintainers.

                                                                                        1. 4

                                                                                          I think you’re underestimating how many years it would take to replace some of this code, let alone verify it actually works on the real hardware without random crashes (as we’ve seen in other reports about new CPU architectures playing Heisenbug). Sure you would want to do that eventually - but I don’t want to be the one telling everyone I’m gonna freeze features until this is done, with potentially more bugs when it’s finished.

                                                                                          1. 12

                                                                                            It’s one thing to say “I don’t have the time to fix this”, it’s another to reject a proposed fix (see drm_sched comment above) or to prevent other people from working on fixes elsewhere in the tree (Hellwig). You don’t have to freeze you feature work when other people are working on fixes and refactorings.

                                                                                          2. 3

                                                                                            How long are you willing to wait for an updated Linux kernel? It may not be “we are unwilling to do maintenance” and more “this is a lot of major work where intermediate steps might not be usable.”

                                                                                            1. 9

                                                                                              There’s a reason it’s called technical debt. It only gets worse the longer you put it off.

                                                                                              1. 2

                                                                                                So I ask again: how long are you willing to wait for an updated Linux kernel with less technical debt?

                                                                                                1. 25

                                                                                                  You’re treating it as a false dichotomy and trying to paint me uncharitably. Stop that.

                                                                                                  1. 8

                                                                                                    If they’re a paid maintainer, then it’s their job to do just that.

                                                                                                    For my personal projects, I can “pay myself” to address technical debt. And I have, because I’m the only user of my code and thus, I have final say in what and how it works. At my previous job, any attempt to address technical debt of the project (that I had been working on for over a decade, pretty much from the start of it) would have been shut down immediately as being too risky, despite the 17,000+ tests [1].

                                                                                                    Where do the incentives come in to address technical debt in the Linux kernel? Is that a better way to ask the question?

                                                                                                    [1] Thanks to new management. At one point, my new manager revoked the code I rewrote to address some minor technical debt to the original code plus the minimum to get it working, because the rewrite was deemed “too risky”.

                                                                                                    1. 5

                                                                                                      At my previous job, any attempt to address technical debt of the project (that I had been working on for over a decade, pretty much from the start of it) would have been shut down immediately as being too risky, despite the 17,000+ tests

                                                                                                      Seems to confirm the point I made here:

                                                                                                      But go explain to your boss who just saw a working prototype, that you need a couple more days to design an alternate implementation, that may or may not be included in the final product. That you still need a couple more automated tests just to make sure. That you’ll take this slow approach now and forever, pinkie promise that’s how we’ll ship sooner.

                                                                                                      […] So unless I have cause to believe my boss understands those things, I just decide such menial considerations are below their pay grade, and tell them I am not done yet.

                                                                                                      But if I can’t even get around the morons up top, I’m out pretty quick, one way or another.

                                                                                              2. 4

                                                                                                The kernel has happily managed major API rewrites before, either merging the changes bit by bit or maintaining both versions in tree until the old one is ripe for deletion. And thru the magic of git and community effort, none of that has to delay the release of new kernels.

                                                                                              3. -1

                                                                                                That is false.

                                                                                                1. 12

                                                                                                  Which part?

                                                                                                  https://lore.kernel.org/lkml/20250128092334.GA28548@lst.de/

                                                                                                  And I also do not want another maintainer.

                                                                                                  1. 2

                                                                                                    Ah, thanks. The meaning I take from that statement is not the same meaning I took from your comment.

                                                                                                    I’m trying to see what you were getting at. What did you mean by “just that”?

                                                                                                    1. 12

                                                                                                      I’m trying to see what you were getting at. What did you mean by “just that”?

                                                                                                      Doing the design work to create safe-to-use APIs with lifetimes considered is part of the work of the maintainer in my view because they should have the best perspective to do so. They got it into that state, they can get it out of that state. Whining that it’s hard work shouldn’t be acceptable as a reason to not do the work.

                                                                                                      1. 2

                                                                                                        Doing the design work to create safe-to-use APIs with lifetimes considered is part of the work of the maintainer in my view because they should have the best perspective to do so. They got it into that state, they can get it out of that state.

                                                                                                        I’m not aware of any precedent for something like this so maybe there’s a way in which you’re right. But there seems to be a contradiction on whether you think we should defer to their judgement.

                                                                                                        Whining that it’s hard work shouldn’t be acceptable as a reason to not do the work.

                                                                                                        I don’t agree with that. I accept the RfL side’s refusal to build and test their own OS or Linux fork, for example.

                                                                                                        1. 3

                                                                                                          I think their judgment is separate from their capability. I don’t think any of these maintainers are fundamentally incompetent people. I’m not sure if they need mentorship on building APIs with regards to lifetimes because they should be aware memory has lifetimes everywhere, implicitly, in their code already.

                                                                                                          1. 1

                                                                                                            I can’t see a way to separate those two things honestly.

                                                                                                            1. 1

                                                                                                              because if you trust them to define “lifetimes,” doesn’t that mean you trust them to estimate the amount of time before such a point when the costs of changing the API outweigh the benefits? yet you don’t trust their estimation of the costs imposed by the practice and the amount of extra work it would take before it yields benefits?

                                                                                            2. 18

                                                                                              Which means it’s not about adding rust, it’s about rust being the reason to poke where it hurts.

                                                                                              Good point! This is something I noticed in a previous job as well, where we introduced computer assistance to existing manual workflows. Apparently the real reason for resistance from the workers was that in the course of this computerization, their “traditional” workflows would be documented and maybe even evaluated before they could be encoded in a computer program. But IIRC this reason was never said out loud by anyone – some developers realized this reason on their own and adjusted their approach, but some didn’t realize this and wondered about the constant pushback.

                                                                                              And maybe to the managers of those workers the computerization was not even the real goal, but the real improvement was supposed to come from the “inventorization” of existing workflows. In a similar way, while the Rust devs want Rust to enter the kernel, maybe some progressive Linux devs sees Rust “just” as a vehicle to make Linux internals more strict and more understandable, and introducing Rust is maybe just a happy side effect of this.

                                                                                              1. 20

                                                                                                maybe some progressive Linux devs sees Rust “just” as a vehicle to make Linux internals more strict and more understandable, and introducing Rust is maybe just a happy side effect of this.

                                                                                                This is the entire situation from the start. Like this is what Rust for Linux is.

                                                                                                1. 5

                                                                                                  Absolutely, and also people do not recognize how unforgiving Rust is to even normal APIs that to you would use in C (insert joke on linked lists), and the constraints that the word “safely” means when applied to Rust.

                                                                                                  Not being able to devise a “safe” Rust abstraction doesn’t mean that the API must be a source of insecurity. Certainly it isn’t a great start, I will grant that, but generally in C you will find that most code ends up doing the same thing that works. The maintainers however recognize that this is not the way to introduce a semi formal definition of how the API operates, and are worried that it may not be possible at all. This is being aware of the environment and the complexity that comes from 20-30 years of development in C; it’s not wanting to “avoid doing the hard work”.

                                                                                                  (For another example, https://lobste.rs/s/hdj2q4/greg_kroah_hartman_makes_compelling_case#c_f5pzow shows how an API could start causing problems when you use it differently, and how one might want to use it differently if he/she has more confidence thanks to a better programming language).

                                                                                                  1. 10

                                                                                                    That comment shows that the API is poorly designed and fixing it was rejected by a maintainer, though?

                                                                                                    1. 4

                                                                                                      The maintainer rejected the fix because (according to him) the API was not poorly designed, but simply not supposed to be used like that (literal quote: “this functionality here isn’t made for your use case”). Which makes sense and is consistent with what I wrote above: the maintainer is conscious of the limits of C and does not want the API to be used in ways that were not anticipated, whereas the Rust developer is more confident because of the more powerful compile-time checks.

                                                                                                      Not knowing the code I cannot understand the tradeoffs involved in the fix. I can’t say whether the maintainer was too cautious, and obviously the failure mode (use after free) is anything but great. My point is that, as you look more in depth, you can see that people actually do put thought in their decisions, but clashes can and will happen if they evaluate the tradeoffs differently.

                                                                                                      As an aside: drm_sched is utility code, not a core part of the graphics stack, so for now the solution to Lina’s issue is going to be a different scheduler that is written in (safe) Rust. Since it appears that there’s going to be multiple Rust graphics drivers soon, they might be able to use Lina’s scheduler and there will be more data points to compare “reuse C code as much as possible” vs “selectively duplicate infrastructure”; see also https://fosstodon.org/@airlied/113052975389174835. Remember that abstracting C to Rust is neither little code nor easy code, therefore it’s not unexpected that in some cases duplication will be easier.

                                                                                                      1. 23

                                                                                                        The maintainer rejected the fix because (according to him) the API was not poorly designed, but simply not supposed to be used like that (literal quote: “this functionality here isn’t made for your use case”). Which makes sense and is consistent with what I wrote above: the maintainer is conscious of the limits of C and does not want the API to be used in ways that were not anticipated, whereas the Rust developer is more confident because of the more powerful compile-time checks.

                                                                                                        This is not a good summary of the situation, though to be fair, some details are buried on Reddit. First of all, what she was doing was the correct approach to dealing with that hardware, according to multiple other DRM maintainers. Lina’s patch actually would have made existing drivers written in C less buggy, because that maintainer was in fact not conscious of the limits of C. drm_sched has very annoying and complex lifetime requirements that are easy to mess up in C, and her patch would’ve simplified them.

                                                                                                        Relevant excerpts from that what Lina said on Reddit:

                                                                                                        The only thing I proposed was making it valid to destroy a scheduler with jobs having not completed. I just added cleanup code to handle an additional case. It was impossible for that change to affect any existing driver that followed the existing implied undocumented rule that you have to wait for all jobs to complete before destroying the scheduler.

                                                                                                        The scheduler is in charge of job lifetimes by design. So this change makes perfect sense. Enforcing that all jobs complete before scheduler destruction would require tracking job lifetimes in duplicate outside the scheduler, it makes no sense. And you can’t fix it by having a simple job->scheduler ref either, because then the scheduler deadlocks on the last job when it tries to free itself from within.

                                                                                                        The only reason this doesn’t crash all the time for other GPU drivers is because they use a global scheduler, while mine uses a per-queue scheduler (because Apple’s GPU uses firmware scheduling, and this is the correct approach for that, as discussed with multiple DRM folks). A global scheduler only gets torn down when you unplug the GPU (ask eGPU users how often their systems crash when they do that… it’s a mess). A per-queue scheduler gets torn down any time a process using the GPU shuts down, so all the time. So I can’t afford that codepath to be broken.

                                                                                                        Here it should be noted that it was not really a case of using something in a new way. The only difference is how many users are affected. Most people don’t unplug their GPU, so the fact that GPU unplugging is broken with many drivers is easy to sweep under the rug. But since it affects every user trying to use Lina’s driver, the problem can’t be swept under the rug and just be ignored.

                                                                                                        And again, my scheduler change absolutely did not change the behavior for existing drivers at all (unless they were already broken, and then it could strictly improve things). That was provable.

                                                                                                        I consulted with multiple DRM folks about how to design this, including actual video meetings, and was told this was the correct approach.

                                                                                                1. 20

                                                                                                  Perhaps a direct link to the LKLM thread would be more interesting? There are comments from many other maintainers, including Christoph Hellwig who recently was part of some drama.

                                                                                                  1. 18

                                                                                                    I have mixed feelings about Linus’s leadership in general, but this is really encouraging:

                                                                                                    And no, I don’t actually think it needs to be all that black-and-white. I’ve stated the above in very black-and-white terms (“becoming a maintainer of the Rust bindings too” vs “don’t want to deal with Rust at all”), but in many cases I suspect it will be a much less harsh of a line, where a subsystem maintainer may be aware of the Rust bindings, and willing to work with the Rust side, but perhaps not hugely actively involved.

                                                                                                    Really appreciate this note in particular – treating this kind of semi-adversarial relationship as the last resort case.

                                                                                                    1. 17

                                                                                                      I don’t think there’s much value in linking phoronix (this time at least). here’s the discussion, with some context to greg’s message: https://lore.kernel.org/rust-for-linux/2025021954-flaccid-pucker-f7d9@gregkh/

                                                                                                      1. 13

                                                                                                        One way to look at the Rust-in-Linux question might be: what should the kernel look like in 20 years? In 2045, should the kernel still have resource allocation that is not checked by a tool but only by people? Should it still have the possibility of out-of-bounds memory access?

                                                                                                        If this should not be the case any more in 2045, something needs to happen in the next twenty years. Staying with a C-only code base for the next twenty years would not work.

                                                                                                        1. 11

                                                                                                          GKH is a champ, hope this strong statement by him oils up some of the squeaky wheels.

                                                                                                          1. 10

                                                                                                            I mean…it’s about time he put on his BDFL hat on this issue, which has been going on for way too long. If I was the Rust binding maintainer, though, I’d be worried about this:

                                                                                                            So when you change the C interfaces, the Rust people will have to deal with the fallout, and will have to fix the Rust bindings. That’s kind of the promise here: there’s that “wall of protection” around C developers that don’t want to deal with Rust issues in the promise that they don’t have to deal with Rust.

                                                                                                            If the DMA maintainer wants to continue his petty anti-Rust crusade, all he has to do is start subtly breaking the bindings in ways that primarily affect the downstream Rust code. Not that he’ll necessarily do that, but…he has both license and plausible deniability to do so.

                                                                                                            Everyone in this situation needs to be put in a room, told to play nice, and hash out their disagreements like adults.

                                                                                                            1. 19

                                                                                                              This also doesn’t address the Asahi problem, where the subsystem doesn’t want to accept improvements written in C to enable new use cases that happen to be implemented in Rust.

                                                                                                              1. 12

                                                                                                                However elsewhere in the thread Ted Tso provides a pretty good response for that, given his original behaviour (which led to the departure of Wedson) it’s a pretty heartening if bittersweet change of heart: https://lore.kernel.org/rust-for-linux/20250219170623.GB1789203@mit.edu/

                                                                                                                I do understand (now) what Wedson was trying to do, was to show off how expressive and powerful Rust can be, even in the face of a fairly complex interface. It turns out there were some good reasons for why the VFS handles inode creation, but in general, I’d encourage us to consider whether there are ways to change the abstractions on the C side so that:

                                                                                                                (a) it makes it easier to maintain the Rust bindings, perhaps even using automatically generation tools,
                                                                                                                (b) it allows Rust newbies having at least some hope of updating the manually maintained bindings,
                                                                                                                (c) without causing too much performance regressions, especially on hot paths, and
                                                                                                                (d) hopefully making things easier for new C programmers from understanding the interface in question.

                                                                                                                1. 4

                                                                                                                  That is good. We’ll see if that attitude propagates. By “the Asahi problem” I meant that in graphics driver land, tussles over a device abstraction and scheduler lifetimes resulted in the R4L folks abandoning the effort to improve(*) the C side of things (and in some cases abandoning the R4L side too).

                                                                                                                  (*) Of course, “improvement” is in the eye of the beholder.

                                                                                                                  1. 2

                                                                                                                    That’s what Ted’s comment is about as far as I can tell, it’s about being willing to clarify the C side, and to modify it in order to improve the interface / abstraction. Especially the (B) and (D) bits which are about making the interface easier for callers to grasp.

                                                                                                              2. 11

                                                                                                                I don’t think that’s the case. I believe this is just restating the existing contract that since Rust in the kernel is still experimental, breaking Rust code does not break the entire build.

                                                                                                                1. 6

                                                                                                                  I’d be pretty surprised if they could even do that to the rust bindings without introducing all sorts of subtle bugs in downstream C code. The only real difference between rust and downstream C here is that rust writes down what behaviour it expects, not that the downstream C code doesn’t have expectations. Fixing those C bugs would be a lot harder than updating the rust… because rust wrote down what it expects.

                                                                                                                  Ultimately I’m not impressed that we aren’t seeing some COC style enforcement here, but other than that I don’t think there’s anything for the leadership to do but wait and react to whatever happens next.

                                                                                                                  1. 1

                                                                                                                    True, but the kernel has always explicitly reserved the right to change kernel only APIs. Such changes are accompanied by tree-wide patches of all the users, which is why out of tree driver maintenance is so difficult.

                                                                                                                  2. 4

                                                                                                                    I think the spirit of

                                                                                                                    But then you take that stance to mean that the Rust code cannot even use or interface to code you maintain.

                                                                                                                    So let me be very clear: if you as a maintainer feel that you control who or what can use your code, YOU ARE WRONG.

                                                                                                                    implies that Linus would take a very dim view of intentionally/frivolously breaking consumers, and I imagine he’d be loud about it.

                                                                                                                    1. 2

                                                                                                                      Everyone in this situation needs to be put in a room, told to play nice, and hash out their disagreements like adults.

                                                                                                                      He encourages exactly that in the last two paragraphs.

                                                                                                                    2. 7

                                                                                                                      As I mentioned on the Orange site:

                                                                                                                      The impression I get from simply reading these various discussions, is that some folks are not convinced that the gain from accepting Rust is worth the pain.

                                                                                                                      Possibly also that a significant portion of the suggested gain may be achievable via other means.

                                                                                                                      i.e. bounds checking and some simple (RAII-like) allocation/freeing simplifications may be possible without rust, and that those are (from the various papers arguing for Rust / memory safety elsewhere) the larger proportion of the safety bugs which Rust catches.

                                                                                                                      Possibly just making clang the required compiler, and adopting these extension may give an easier bang-for-buck: https://clang.llvm.org/docs/BoundsSafety.html

                                                                                                                      Over and above that, there seem to be various complaints about the readability and aesthetics of Rust code, and a desire not to be subjected to such.

                                                                                                                      1. 31

                                                                                                                        i.e. bounds checking and some simple (RAII-like) allocation/freeing simplifications may be possible without rust,

                                                                                                                        This is handwaved at all over the internet these days, rarely with any evidence behind it. Let me offer a simple piece of evidence to the contrary: If it was easy to retrofit this to C, we’d have done it by now.

                                                                                                                        Long before Clang’s -fbounds-safety, there was Annex K, which adds bounds-checked versions of a bunch of standard library interfaces. This relies on passing around the pointer and the length separately, which makes it error-prone. As a result, adopting it naively into an existing C codebase can actually introduce new bugs. Now, with -fbounds-safety, pointers are instead __bidi_indexable by default, which adds a lower and upper bound to the type and increases their size threefold. I think that’s quite a hefty price to pay, probably too high for the kernel.

                                                                                                                        As for RAII, ThePhD (editor of the C standard) has written an excellent article explaining why it’s much harder than expected to add this to C.

                                                                                                                        those are the larger proportion of the safety bugs which Rust catches.

                                                                                                                        I’m not convinced. C++ has RAII and that hasn’t really helped stop use-after-free vulnerabilites. Because you can (and often do) get pointers/references from your types, those still abound. Every C++ beginner encounters iterator invalidation UB within 5 minutes of discovering range-based for loops. To actually prevent use-after-free fully via RAII would require exclusive use of smart pointers, which is again way too costly for the kernel in terms of performance.

                                                                                                                        “Smashing the Stack for Fun and Profit” dropped in 1996. We’ve had nearly 30 years to incrementally improve the security of C code. A lot has happened in that time, in terms of development tools (fuzzers, static analyzers), runtime mitigations (W^X, ASLR, library hardening, CFI) and now even hardware-based mitigations like ARM authenticated pointers. Yet, memory-unsafety related vulnerabilities are still all over the place. If there was an easy way to fix this for C, we’d have found it by now.

                                                                                                                        1. 1

                                                                                                                          IMO, annex K was always bollocks - the APIs were just poor.

                                                                                                                          Like most everyone else who has used C for any length of time, I ended up writing my own “counted string” routines, and using them. The same idea has been re-implemented countless times.

                                                                                                                          As to C++, I don’t use it. I long ago came to the conclusion it was a “poor” / “bad” / “dangerous” language, simply due to the piles of baroque features which kept on getting added. I was able to use C for almost all of my career.

                                                                                                                          The bounds checking for C could have been added at any prior point, the reason it wasn’t was (IMO) largely lack of economic incentives, and lack of political interest. It is (again IMO) the latter which has started to force that work to happen.

                                                                                                                          These additions (bounds checks, and some temporal safety mechanism) don’t have to achieve the improvements that Rust can, all they have to do is keep existing code running for long enough to allow for natural wastage / replacement.

                                                                                                                          1. 6

                                                                                                                            I agree with your last point 100%. But we have to ask ourselves: replacement with what? If we can’t replace core pieces of the kernel with Rust, what else is going to do the job?

                                                                                                                            1. 3

                                                                                                                              The problem there is that it isn’t replacing the product, it is trying to modify it. It does raise the question of what approach other kernels are taking.

                                                                                                                              We know that (at least in part) Apple have XNU being modified to use the clang bounds checked C, I don’t know if they intend to maybe put Swift in their kernel. What are the various BSDs doing in this area?

                                                                                                                              MS I imagine may actually be going for the “Rewrite it in Rust” approach.

                                                                                                                              For most commercial software, that sort of modifying to become a multi-language program simply would not be contemplated. There the issue is to keep the existing “investment” going until the whole thing is replaced / rewritten. Generally if the product per-se is a box with an OS, then programs may be replaced over time with rewritten programs in a different language. I suggest this is because most commercial s/w are not OS kernels.

                                                                                                                              The Linux kernel is a different story, and it is that attempted direction towards a multi-language program which (IMO) is the cause of the grief.

                                                                                                                              Possibly an approach may be to move the distributions towards something like a Qubes like scenario, and have the nested VM kernels be rewrites. Then actual applications would be constrained within that “safer” environment, and over time those rewritten kernels can get to the point that they can replace the base layer?

                                                                                                                              Or possibly before that happens, new CHERI based hardware will have been deployed, and the issue avoided.

                                                                                                                              So to get back to your question, it may be that the only thing to do is write a new Linux compatible kernel in Rust, however there is a reluctance to do so (for various reasons). The above Qubes like approach may be a way to get there…

                                                                                                                              Alternately wait the 5 - 15 years for the “grey beards” to retire, and then whomsoever takes over herding the Kernel Cats can go full throttle at rustifying it.

                                                                                                                              1. 7

                                                                                                                                Your last paragraph? This is what is happening. It doesn’t come from nowhere. We are at that stage.

                                                                                                                            2. 3

                                                                                                                              Yeah, Annex K is really bad.

                                                                                                                              It depends on global mutable state for its error handling, it has no scoping to threads or modules, there’s no way for a library to control the Annex K exception blast radius.

                                                                                                                              Many of the Annex K functions add a second length parameter and most of the time I can’t work out how it is supposed to be different from the first one. It seems to me like they came up with a poorly conceived rule and applied it across the board without much thought about whether it makes sense in each case.

                                                                                                                              Not even Microsoft implements it properly.

                                                                                                                          2. 16

                                                                                                                            The impression I get from simply reading these various discussions, is that some folks are not convinced that the gain from accepting Rust is worth the pain.

                                                                                                                            Possibly also that a significant portion of the suggested gain may be achievable via other means.

                                                                                                                            i.e. bounds checking and some simple (RAII-like) allocation/freeing simplifications may be possible without rust, and that those are (from the various papers arguing for Rust / memory safety elsewhere) the larger proportion of the safety bugs which Rust catches.

                                                                                                                            People have been trying to make C safer for decades, with only very limited success. I don’t believe bounds checking and RAII are nearly enough. As can be seen with C++, RAII does not prevent misuse, nor does bounds checking. They’re very useful tools! But they don’t help you enforce code correctness the way the borrow checker and Rust’s type system do. As I recall, Google reported that only did their new Rust code has essentially zero memory safety issues, all other vulnerability bugs were also less severe than in their C++ code.

                                                                                                                            1. 2

                                                                                                                              RAII would seem to me, to be a bit player.

                                                                                                                              As I recall from reading the various papers and blog posts, lack of “spacial safety” (i.e. bounds checks) seemed to account for between 1/2 and 3/4 of the various “memory safety” vulnerabilities, depending upon which paper one read, and when.

                                                                                                                              That is an easy thing to fix, and without having to change language. Hence why I suspect that clang enhancement has legs, and not just for Linux (if they adopt it).

                                                                                                                              After that, we had “temporal safety” (double free, use after free, etc) making up between 1/4 and 1/3 of issues. So that would be the next one to tackle, and where things like RAII and ref-counting would come in; but again not sufficient to handle all such issues.

                                                                                                                              Beyond that, we have a large collection of C code in use across the industry, and it isn’t going to get rewritten in Rust in any practical timescale. Corporations simply won’t do that, they’ll sell their existing stuff, pay lip-service to safety (by emphasizing distractions like the use of static analysis, etc).

                                                                                                                              So stuff like bounds checking annotations for C compilers are where I see how we can make a practical improvement in the maintenance of existing code, as it isn’t going to be thrown away for at least a decade. After the bounds checks are enabled, doing something for the temporal issues.

                                                                                                                              Then keeping things going while replacements written in other languages and/or on safer h/w systems gradually take over.

                                                                                                                              1. 4

                                                                                                                                As I recall from reading the various papers and blog posts, lack of “spacial safety” (i.e. bounds checks) seemed to account for between 1/2 and 3/4 of the various “memory safety” vulnerabilities, depending upon which paper one read, and when.

                                                                                                                                That is an easy thing to fix, and without having to change language. Hence why I suspect that clang enhancement has legs, and not just for Linux (if they adopt it).

                                                                                                                                The bounds safety feature has two modes, external bounds and internal bounds. External bounds is using a separate variable for the bounds (like an n argument), which of course means mistakes are possible, you could accidentally get the wrong bounds for a particular array. Internal bounds store the bounds of a pointer in the pointer. This either doubles or triples the size. The latter may be hard to stomach, and in some cases probably not possible at all to use.

                                                                                                                                Would adopting external bounds in existing C code in the kernel be a good idea? Yeah. Maybe even internal bounds in some cases? Probably.

                                                                                                                                Would writing new code in that style be as useful as writing new code in Rust? Nowhere close IMO.

                                                                                                                                After that, we had “temporal safety” (double free, use after free, etc) making up between 1/4 and 1/3 of issues. So that would be the next one to tackle, and where things like RAII and ref-counting would come in; but again not sufficient to handle all such issues.

                                                                                                                                Linux already uses refcounting extensively, but mistakes are possible. RAII helps with that, but since C++ code still has issues with double free, use after free, etc, it clearly isn’t enough.

                                                                                                                                Beyond that, we have a large collection of C code in use across the industry, and it isn’t going to get rewritten in Rust in any practical timescale. Corporations simply won’t do that, they’ll sell their existing stuff, pay lip-service to safety (by emphasizing distractions like the use of static analysis, etc).

                                                                                                                                No one is really trying to get the world rewritten in Rust anyway. Writing only new code in Rust gets you 95% of the benefit of rewriting all code in Rust. So actual re-writes tend to be limited to code that has proven to be especially prone to problems, like Android’s Binder kernel code.

                                                                                                                                So stuff like bounds checking annotations for C compilers are where I see how we can make a practical improvement in the maintenance of existing code, as it isn’t going to be thrown away for at least a decade. After the bounds checks are enabled, doing something for the temporal issues.

                                                                                                                                Then keeping things going while replacements written in other languages and/or on safer h/w systems gradually take over.

                                                                                                                                That’s basically what’s happening. E.g. Google is trying to write more and more brand new stuff in Rust, while trying to make their existing C++ code safer via pervasive bounds checking and other stuff.

                                                                                                                          3. 6

                                                                                                                            Well, he’s sort of laboring the same point a few times, but I do like the clarity.

                                                                                                                            1. 7

                                                                                                                              Seems justified considering how fervent and voluminous the fight’s gotten. People started talking about abandoning the Linux kernel for a rust-native form, which such a decision nips in the bud.

                                                                                                                            2. 6

                                                                                                                              @pushcx the actual link to the lkml was deleted:

                                                                                                                              2025-02-21 07:28 pushcx Story: Linus replies to R4L controversy Action: deleted story Reason: Don’t link into projects’ issue trackers and discussion spaces to brigade Lobsters readers into their arguments.

                                                                                                                              Is this link allowed?

                                                                                                                              1. 5

                                                                                                                                Yes, extremely silly that the direct link was censored, but the exact same content via phoronix is allowed. There were some good discussion in the deleted thread.

                                                                                                                                  1. 7

                                                                                                                                    Thanks for the link here. Yeah, it’s not about the content of Linus’s email here, it’s about linking our 100k+ readers into projects’ community spaces. Linux is sort of the worst possible first example here because it’s a huge stable project and the friction of signing up to a single-purpose high-volume mailing list means it’s especially unlikely that we’re going to meaningfully disrupt Linux. The other end of the spectrum is linking into a small project’s GitHub issue, where most of our readers are going to be logged in and looking at that inviting <textarea> on a contentious topic with little context or history.

                                                                                                                                    If the rule is “don’t submit links into projects’ spaces” it’s a clear rule, and I admit that it’s overkill for this specific situation. “Don’t submit links into projects’ spaces unless they’re big and probably fine like Linux or Mozilla (but a big project like Firefox not a small one like NSS)” is an unending series of judgment calls that are often going to be about really contentious issues that feel like they justify an exception to our rules or to common courtesy. It’s an imperfect rule, but there’s value in predictability and legibility.

                                                                                                                                    If this compromise isn’t clear from what I’ve written in that code and the guidelines, I’m very open to suggestions for improving it. Doubly so if it’s the wrong compromise and there’s a path to us having better conversations and being a better neighbor on the web. As a reminder, the next office hours stream is in ~2h hours and this is the kind of thing I’m started office hours to talk about, in the hopes that folks find that more convenient or less formal than a meta thread or emailing me.

                                                                                                                                    1. 3

                                                                                                                                      I feel the tradeoff is this instead basically links to blogspam that barely summarizes it, then links it anyways. They get the ad revenue. Maybe if we waited for a better thing to post about it, i.e. an LKML article or something in that ballpark?

                                                                                                                                      1. 5

                                                                                                                                        The small benefit is that it’s one more small step that makes bad behavior less likely but, you’re right, it does incentivize lazy sites like this one.

                                                                                                                                        You probably meant to write LWN? I’ve been mentioning them a lot in this running discussion about what our rules should be, I agree they’re a consistently excellent source. I don’t want to take a hard dep on them so I try to write things like “neutral third-party” but yeah, they’re first in my thoughts as well.

                                                                                                                                        One aspect of getting good writeups of these things is false urgency, or maybe that urgency depends on proximity. To people who are involved or affected by the topic, Linus posting a single email is a significant development. They want to know immediately because it could significantly affect their work. So they want to see the primary source, or a repost of it. But anyone outside that narrow circle needs a writeup that explains the topic and puts it into the context of the last few months of news. That takes a lot more time to produce and sometimes it doesn’t happen. So even for obviously topical stories we have two very different kinds of readers. A significant part of the brigading problem is when the second, bigger group hits an update appropriate for the narrow group. They can’t contextualize it, but if it hits a hot button like the morality of licensing or Linus insulting people, it can generate a lot of outrage that makes them feel like they need to do something, and that unacceptable behavior like trolling is justified by the circumstances.

                                                                                                                                        For a long time Lobsters has avoided being a source of brigading by trying to have norms that are kinder than average. That lowers the temperature of every discussion, makes us less appealing to the serious trolls, and makes it less likely that any particular discussion is going to gather enough outrage to hit the critical mass where our readers brigade into a project. But much bigger than our active users, our readership has been growing steadily, so even as our norms reduce the percentage chance of bad behavior, I’m worried that it’s not reducing enough to offset growth. If the percent risk drops by half but the readership grows 10x, we have a higher absolute risk.

                                                                                                                                        To bring it back to a specific example, last summer Nix was having a running governance crisis around the project’s direction, corporate/government involvement, and codes of conduct. There was a series of stories about breaking news and new dimensions to the broader story about who should be running Nix and how, and it was tons of hot-button issues. A lot of their work happens on GitHub and a bunch of the issues tracking different proposals, petitions, and governance actions were submitted here, so all of the ingredients for brigading were present and temperatures were rising. Some of the links were submitted by the people directly involved. To put it charitably they were advocating and organizing for better governance; to put it uncharitably they were trying to brigade our readers into the project to overwhelm it. I did my best to separate the two and I think we discussed very important, hard topics while being a good neighbor on the web, but it’s why I added to the brigading guidelines about preferring not to link into project spaces.

                                                                                                                                        To sum up, the rule against linking into community spaces trades off between a lot of hard topics. I’m trying to reduce judgment calls and our risk of harming projects while maintaining high-quality discussions on important topics. Sacrificing urgency draws a predictable, clear line about what links are acceptable, though I know that’s especially frustrating to people who are most involved with breaking news. So that’s why my last message called the rule a compromise, and I again encourage folks to help the site figure out better ones.

                                                                                                                                        1. 1

                                                                                                                                          Probably a silly question, but would something like an https://archive.is snapshot of the target be enough of a barrier to brigading? Could even add that functionality internally..

                                                                                                                                1. 3

                                                                                                                                  Yeah, I wasn’t sure about this either. I understand the rationale for the no-brigading, but I don’t see much difference in posting this URL vs LKML directly.

                                                                                                                                  (Also hope there’s a way we can still discuss Linus’s statement regardless)

                                                                                                                                2. 5

                                                                                                                                  Xe Iaso said this on Bluesky, and it’s been stuck in my head ever since: a hard fork is going to happen.

                                                                                                                                  1. 14

                                                                                                                                    That seems extremely unlikely to me as it would be extremely divisive and expensive for LF, and LF decides what Linux is. Fuschia and other OS’s are much more likely to take market share (Android, ChromeOS, smart devices, etc) than for Linux itself to fundamentally change. Various LF funders are more likely to just sit and wait for those, or to mitigate many of these issues with other projects - ex: it’s way cheaper to build a secure VM than to build a new kernel, so the major LF companies all have their own VMs, mitigating a lot of kernel issues.

                                                                                                                                    1. 9

                                                                                                                                      It’s not. This is just a huge tempest in a teapot.

                                                                                                                                      1. 7

                                                                                                                                        I don’t see it happen either right now. But something will survive the current Linux in 50 years. And that is either a Linux which cleaned up all the issues - with or without Rust - or something that surpassed linux because no one bothered to clean up the legacy and people potentially even moved on from projects with millions of lines of C (which again doesn’t have to be Rust). Which in turn seems to be a question of what big corpo invests their money in (Android, VMs + Bare Metal, IoT, ChromeOS) - and what the legislation requires them to achieve in terms of security.

                                                                                                                                    2. 2

                                                                                                                                      So when you change the C interfaces, the Rust people will have to deal with the fallout, and will have to fix the Rust bindings.

                                                                                                                                      Does Linus accept patches that break rust drivers? Isn’t that the implication here? Subsystem maintainer can break C interfaces. How does it become clear that the rust code has been broken? If not via code moving upstream then isn’t the subsystem maintainer effectively blocked from pushing upstream until the rust maintainers fix all of the code using those modified C interfaces? Is Linus being sly here, essentially expecting these people to work together collaboratively or neither will be able to push upstream?

                                                                                                                                      1. 5

                                                                                                                                        And no, I don’t actually think it needs to be all that black-and-white. I’ve stated the above in very black-and-white terms (“becoming a maintainer of the Rust bindings too” vs “don’t want to deal with Rust at all”), but in many cases I suspect it will be a much less harsh of a line, where a subsystem maintainer may be aware of the Rust bindings, and willing to work with the Rust side, but perhaps not hugely actively involved.

                                                                                                                                        This part seems to clarify it’s mostly an extreme example for the sake of argument.

                                                                                                                                        1. 4

                                                                                                                                          If Rust code get broken, it will get fixed during the stabilization part of the release cycle, I guess.

                                                                                                                                          1. 2

                                                                                                                                            I assume that it’s likely that breaking a C interface that will break rust drivers will also break C drivers.

                                                                                                                                            1. 2

                                                                                                                                              Maintainers of C subsystems fix the C drivers themselves.

                                                                                                                                              That’s why there was controversy whether the Rust side can really really be broken, or whether maintainers will end up being forced to learn Rust to fix the Rust drivers too.

                                                                                                                                          2. 2

                                                                                                                                            While I think moving on from C to a memory safe systems programming language is a good idea, for various reasons, I don’t think Rust will end up being a good choice. Ultimately, I think the choice of Rust is going to hurt Linux.

                                                                                                                                            I hope to be proven wrong.

                                                                                                                                            1. 5

                                                                                                                                              While I’m not exactly a Rust superfan, a bird in the hand is worth two in the bush.

                                                                                                                                              1. 1

                                                                                                                                                Do you have anything in mind that would be a better choice than Rust?

                                                                                                                                                1. 1

                                                                                                                                                  I haven’t done low level programming for a long time, and I’ve never done kernel programming, so I’m probably the wrong person to ask.

                                                                                                                                                  That being said, I’ve heard good things about Zig.

                                                                                                                                                  1. 4

                                                                                                                                                    Zig is nicer than C, but doesn’t bring any fundamental improvements. A big part of the reasoning for introducing Rust to the kernel despite the difficulty and complexity is that memory safety is a big deal, and Zig has no memory safety story at all.

                                                                                                                                                    1. 1

                                                                                                                                                      I’d suggest that Zig has one memory safety story, namely that it has bounds checks (if not disabled) for arrays, so a spacial safety story.

                                                                                                                                                      That said, there are other reasons why I’ve gone off Zig.