1. 47
  1. 9

    The thread on LKML about this work really doesn’t portray the Linux community in a good light. With a dozen or so new kernels being written in Rust, I wouldn’t be surprised if this team gives up dealing with Linus and goes to work on adding good Linux ABI compatibility to something else.

    1. 26

      I dunno, Linus’ arguments make a lot of sense to me. It sounds like he’s trying to hammer some realism into the idealists. The easter bunny and santa claus comment was a bit much, but otherwise he sounds quite reasonable.

      1. 19

        Disagreement is over whether “panic and stop” is appropriate for kernel, and here I think Linus is just wrong. Debugging can be done by panic handlers, there is just no need to continue.

        Pierre Krieger said it much better, so I will quote:

        Part of the reasons why I wrote a kernel is to confirm by experience (as I couldn’t be sure before) that “panic and stop” is a completely valid way to handle Rust panics even in the kernel, and “warn and continue” is extremely harmful. I’m just so so tired of the defensive programming ideology: “we can’t prevent logic errors therefore the program must be able to continue even a logic error happens”. That’s why my Linux logs are full of stupid warnings that everyone ignores and that everything is buggy.

        One argument I can accept is that this should be a separate discussion, and Rust patch should follow Linux rule as it stands, however stupid it may be.

        1. 7

          I think the disagreement is more about “should we have APIs that hide the kernel context from the programmer” (e.g. “am I in a critical region”).

          This message made some sense to me: https://lkml.org/lkml/2022/9/19/840

          Linus’ writing style has always been kind of hyperbolic/polemic and I don’t anticipate that changing :( But then again I’m amazed that Rust-in-Linux happened at all, so maybe I should allow for the possibility that Linus will surprise me.

          1. 1

            This is exactly what I still don’t understand in this discussion. Is there something about stack unwinding and catching the panic that is fundamentally problematic in, eg a driver?

            It actually seems like it would be so much better. It recovers some of the resiliency of a microkernel without giving up the performance benefits of a monolithic kernel.

            What if, on an irrecoverable error, the graphics driver just panicked, caught the panic at some near-top-level entry point, reset to some known good state and continued? Seems like such an improvement.

            1. 5

              I don’t believe the Linux kernel has a stack unwinder. I had an intern add one to the FreeBSD kernel a few years ago, but never upstreamed it (*NIX kernel programmers generally don’t want it). Kernel stack traces are generated by following frame-pointer chains and are best-effort debugging things, not required for correctness. The Windows kernel has full SEH support and uses it for all sorts of things (for example, if you try to access userspace memory and it faults, you get an exception, whereas in Linux or FreeBSD you use a copy-in or copy-out function to do the access and check the result).

              The risk with stack unwinding in a context like this is that the stack unwinder trusts the contents of the stack. If you’re hitting a bug because of stack corruption then the stack unwinder can propagate that corruption elsewhere.

              1. 1

                With the objtool/ORC stuff that went into Linux as part of the live-patching work a while back it does actually have a (reliable) stack unwinder: https://lwn.net/Articles/728339/

                1. 2

                  That’s fascinating. I’m not sure how it actually works for unwinding (rather than walking) the stack: It seems to discard the information about the location of registers other than the stack pointer, so I don’t see how it can restore callee-save registers that are spilled to the stack. This is necessary if you want to resume execution (unless you have a setjmp-like mechanism at the catch site, which adds a lot of overhead).

                  1. 2

                    Ah, a terminological misunderstanding then I think – I hadn’t realized you meant “unwinding” specifically as something sophisticated enough to allow resuming execution after popping some number of frames off the stack; I had assumed you just meant traversal of the active frames on the stack, and I think that’s how the linked article used the term as well (though re-reading your comment now I realize it makes more sense in the way you meant it).

                    Since AFAIK it’s just to guarantee accurate stack backtraces for determining livepatch safety I don’t think the objtool/ORC functionality in the Linux kernel supports unwinding in your sense – I don’t know of anything in Linux that would make use of it, aside from maybe userspace memory accesses (though those use a separate ‘extable’ mechanism for explicitly-marked points in the code that might generate exceptions, e.g. this).

                    1. 2

                      If I understand the userspace access things correctly, they look like the same mechanism as FreeBSD (no stack unwinding, just quick resumption to an error handler if you fault on the access).

                      I was quite surprised that the ORC[1] is bigger than DWARF. Usually DWARF debug info can get away with being large because it’s stored in separate pages in the binary from the file and so doesn’t consume any physical memory unless used. I guess speed does matter for things like DTrace / SystemTap probes, where you want to do a full stack trace quickly, but in the kernel you can’t easily lazily load the code.

                      The NT kernel has some really nice properties here. Almost all of the kernel’s memory (including the kernel’s code) is pageable. This means that the kernel’s unwind metadata can be swapped out if not in use, except for the small bits needed for the page-fault logic. In Windows, the metadata for paged-out pages is stored in PTEs and so you can even page out page-table pages, but you can then potentially need to page in every page in a page-table walk to handle a userspace fault. That extreme case probably mattered a lot more when 16 MiB of RAM was a lot for a workstation than it does now, but being able to page out rarely-used bits of kernel is quite useful.

                      In addition, the NT kernel has a complete SEH unwinder and so can easily throw exceptions. The SEH exception model is a lot nicer than the Itanium model for in-kernel use. The Itanium C++ ABI allocates exceptions and unwind state on the heap and then does a stack walk, popping frames off to get to handlers. The SEH model allocates them on the stack and then runs each cleanup frame, in turn, on the top of the stack then, at catch, runs some code on top of the stack before popping off all of the remaining frames[2]. This lets you use exceptions to handle out-of-memory conditions (though not out-of-stack-space conditions) reliably.

                      [1] Such a confusing acronym in this context, given that the modern LLVM JIT is also called ORC.

                      [2] There are some comments in the SEH code that suggest that it’s flexible enough to support the complete set of Common Lisp exception models, though I don’t know if anyone has ever taken advantage of this. The Itanium ABI can’t support resumable exceptions and needs some hoop jumping for restartable ones.

              2. 4

                What you are missing is that stack unwinding requires destructors, for example to unlock locks you locked. It does work fine for Rust kernels, but not for Linux.

            2. 7

              Does the kernel have unprotected memory and just rolls with things like null pointer dereferences reading garbage data?

              For errors that are expected Rust uses Result, and in that case it’s easy to sprinkle the code with result.or(whoopsie_fallback) that does not panic.

              1. 4

                As far as I understand, yeah, sometimes the kernel would prefer to roll with corrupted memory as far as possible:

                So BUG_ON() is basically ALWAYS 100% the wrong thing to do. The argument that “there could be memory corruption” is [not applicable in this context]. See above why.

                (from docs and linked mail).

                null derefernces in particular though usually do what BUG_ON essentially does.

                And things like out-of-bounds accesses seem to end with null-dereference:

                https://github.com/torvalds/linux/blob/45b588d6e5cc172704bac0c998ce54873b149b22/lib/flex_array.c#L268-L269

                Though, notably, out-of-bounds access doesn’t immediately crash the thing.

                1. 8

                  As far as I understand, yeah, sometimes the kernel would prefer to roll with corrupted memory as far as possible:

                  That’s what I got from the thread and I don’t understand the attitude at all. Once you’ve detected memory corruption then there is nothing that a kernel can do safely and anything that it does risks propagating the corruption to persistent storage and destroying the user’s data.

                  Linus is also wrong that there’s nothing outside of a kernel that can handle this kind of failure. Modern hardware lets you make it very difficult to accidentally modify the kernel page tables. As I recall, XNU removes all of the pages containing kernel code from the direct map and protects the kernel’s page tables from modification, so that unrecoverable errors can take an interrupt vector to some immutable code that can then write crash dumps or telemetry and reboot. Windows does this from the Secure Kernel, which is effectively a separate VM that has access to all of the main VM’s memory but which is protected from it. On Android, Halfnium provides this kind of abstraction.

                  I read that entire thread as Linus asserting that the way that Linux does things is the only way that kernel programming can possibly work, ignoring the fact that other kernels use different idioms that are significantly better.

                  1. 5

                    Reading this thread is a little difficult because the discussion is evenly spread between the patch set being proposed, some hypothetical plans for further patch sets, and some existing bad blood between the Linux and Rust community.

                    The “roll with corrupted memory as far as possible” part is probably a case of the “bad blood” part. Linux is way more permissive with this than it ought to be but this is probably about something else.

                    The initial Rust support patch set failed very eagerly and panicked, including on cases where it really is legit not to panic, like when failing to allocate some memory in a driver initialization code. Obviously, the Linux idiom there isn’t “go on with whatever junk pointer kmalloc gives you there” – you (hopefully – and this is why we should really root for memory safety, because “hopefully” shouldn’t be a part of this!) bail out, that driver’s initialization fails but kernel execution obviously continues, as it probably does on just about every general-purpose kernel out there.

                    The patchset’s authors actually clarified immediately that the eager panics are actually just an artefact of the early development status – an alloc implementation (and some bits of std) that follows safe kernel idioms was needed, but it was a ton of work so it was scheduled for later, as it really wasn’t relevant for a first proof of concept – which was actually a very sane approach.

                    However, that didn’t stop seemingly half the Rustaceans on Twitter to take out their pitchforks, insists that you should absolutely fail hard if memory allocation fails because what else are you going to do, and rant about how Linux is unsafe and it’s riddled with security bugs because it’s written by obsolete monkeys from the nineties whose approach to memory allocation failures is “well, what could go wrong?” . Which is really not the case, and it really does ignore how much work went into bolting the limited memory safety guarantees that Linux offers on as many systems as it does, while continuing to run critical applications.

                    So when someone mentions Rust’s safety guarantees, even in hypothetical cases, there’s a knee-jerk reaction for some folks on the LKML to feel like this is gonna be one of those cases of someone shitting on their work.

                    I don’t want to defend it, it’s absolutely the wrong thing to do and I think experienced developers like Linus should realize there’s a difference between programmers actually trying to use Rust for real-world problems (like Linux), and Rust advocates for whom everything falls under either “Rust excels at this” or “this is an irrelevant niche case”. This is not a low-effort patch, lots of thinking went into it, and there’s bound to be some impedance mismatch between a safe language that tries to offer compile-time guarantees and a kernel historically built on overcoming compiler permisiveness through idioms and well-chosen runtime tradeoffs. I don’t think the Linux kernel folks are dealing with this the way they ought to be dealing with it, I just want to offer an interpretation key :-D.

                2. 1

                  No expert here, but I imagine linux kernel has methods of handling expected errors & null checks.

                3. 6

                  In an ideal world we could have panic and stop in the kernel. But what the kernel does now is what people expect. It’s very hard to make such a sweeping change.

                  1. 6

                    Sorry, this is a tangent, but your phrasing took me back to one of my favorite webcomics, A Miracle of Science, where mad scientists suffer from a “memetic disease” that causes them to e.g. monologue and explain their plans (and other cliches), but also allows them to make impossible scientific breakthroughs.

                    One sign that someone may be suffering from Science Related Memetic Disorder is the phrase “in a perfect world”. It’s never clearly stated exactly why mad scientists tend to say this, but I’d speculate it’s because in their pursuit of their utopian visions, they make compromises (ethical, ugly hacks to technology, etc.), that they wouldn’t have to make in “a perfect world”, and this annoys them. Perhaps it drives them to take over the world and make things “perfect”.

                    So I have to ask… are you a mad scientist?

                    1. 2

                      I aspire to be? bwahahaa

                      1. 2

                        Hah, thanks for introducing me to that comic! I ended up archive-bingeing it.

                      2. 2

                        What modern kernels use “panic and stop”? Is it a feature of the BSDs?

                        1. 8

                          Every kernel except Linux.

                          1. 2

                            I didn’t exactly mean bsd. And I can’t name one. But verified ones? redox?

                            1. 1

                              I’m sorry if my question came off as curt or snide, I was asking out of genuine ignorance. I don’t know much about kernels at this level.

                              I was wondering how much an outlier the Linux kernel is - @4ad ’s comment suggests it is.

                              1. 2

                                No harm done

                        2. 4

                          I agree. I would be very worried if people writing the Linux kernel adopted the “if it compiles it works” mindset.

                          1. 2

                            Maybe I’m missing some context, but it looks like Linus is replying to “we don’t want to invoke undefined behavior” with “panicking is bad”, which makes it seem like irrelevant grandstanding.

                            1. 2

                              The part about debugging specifically makes sense in the “cultural” context of Linux, but it’s not a matter of realism. There were several attempts to get “real” in-kernel debugging support in Linux. None of them really gained much traction, because none of them really worked (as in, reliably, for enough people, and without involving ritual sacrifices), so people sort of begrudgingly settled for debugging by printf and logging unless you really can’t do it otherwise. Realistically, there are kernels that do “panic and stop” well and are very debuggable.

                              Also realistically, though: Linux is not one of those kernels, and it doesn’t quite have the right architecture for it, either, so backporting one of these approaches onto it is unlikely to be practical. Linus’ arguments are correct in this context but only insofar as they apply to Linux, this isn’t a case of hammering realism into idealists. The idealists didn’t divine this thing in some programming class that only used pen, paper and algebra, they saw other operating systems doing it.

                              That being said, I do think people in the Rust advocacy circles really underestimate how difficult it is to get this working well for a production kernel. Implementing panic handling and a barebones in-kernel debugger that can nonetheless usefully handle 99% of the crashes in a tiny microkernel is something you can walk third-year students through. Implementing a useful in-kernel debugger that can reliably debug failures in any context, on NUMA hardware of various architectures, even on a tiny, elegant microkernel, is a whole other story. Pointing out that there are Rust kernels that do it well (Redshirt comes to mind) isn’t very productive. I suspect most people already know it’s possible, since e.g. Solaris did it well, years ago. But the kind of work that went into that, on every level of the kernel, not just the debugging end, is mind-blowing.

                              (Edit: I also suspect this is the usual Rust cultural barrier at work here. The Linux kernel community is absolutely bad at welcoming new contributors. New Rust contributors are also really bad at making themselves welcome. Entertaining the remote theoretical possibility that, unlikely though it might be, it is nonetheless in the realm of physical possibility that you may have to bend your technology around some problems, rather than bending the problems around your technology, or even, God forbid, that you might be wrong about something, can take you a very long way outside a fan bulletin board.)

                              1. 1

                                easter bunny and santa claus comment

                                Wow, Linus really has mellowed over the years ;)

                            2. -2

                              They should have rather merged in Ada support, in my opinion, which is an actual mature language with mature programmers, great and proven tooling and a wider range of security guarantees while providing much better readability.

                              I understand though why they caved in to the pressure, given how obnoxious Rust-evangelists can be.

                              Still, let’s see how this pans out and look back at it in a few years.

                              1. 18

                                Rust makes a lot more sense than Ada.

                                The Ada open source community is tiny, and doesn’t even show up in the top 50 languages in stars, pushes, pull requests or issues on GitHub. While the docs, tutorials and tooling has vastly improved in the last couple of years with rough cargo/rustup equivalents and a package manager, the language’s reach is much, much smaller than Rust’s. For comparison, the biggest Rust project on GitHub has over 80,000 stars… where the most starred Ada repository (which is mine), has a little bit over 300. If you’re interested, it’s an exciting place to be, but Ada doesn’t make sense for this context.

                                an actual mature language with mature programmers

                                The language is stable, but hasn’t been sold well or at all to the current generation of programmers. It hasn’t been sold, so there isn’t as large a base to draw programmers from as with Rust. The response to me telling people that I’ve written open source projects in the language is usually “That language is still around?”

                                wider range of security guarantees

                                Overall, Rust probably has more guarantees (e.g. use-after-free). Ada is also very good, and definitely underestimated in what it has – typed pointers which can be restricted to where they’re used to prevent some problems, zeroing of pointers after free, bounds-checked arrays, pointers (access types) must be explicitly converted to addresses to do math, and it also has the SPARK subset for verification. It’s pretty difficult to shoot yourself in the foot in Ada, but you can, and I’ve definitely done it a few times.

                                while providing much better readability.

                                Arguable both ways. Rust can be exceptionally terse and to the point, Ada is long winded but has fewer symbols and I think you can encode a bit more domain information.

                                Rust has momentum and has done a fantastic job both in building the ecosystem around the language (clippy, rust-analyzer, cargo, rustdoc, IDE plugins, etc.) and then selling the language. It is pretty close to the gold standard of how to build an open source programming language.

                                1. 8

                                  I think it boils down to this: if you write Ada binding to Linux kernel core APIs and write drivers against it, Ada support will be considered. If you don’t, it won’t.

                                  In other words, it’s about how industrious Rust practitioners are, not about how obnoxious Rust evangelists are.

                                  1. 5

                                    I think Ada is a really neat language! However, as one example, I don’t think it solves use-after-free bugs. To quote Ada 95: The Craft of Object-Oriented Programming

                                    What is needed is a way of telling the system to deallocate the memory so that it can be reused by anything that needs it. The way to do this is to use the standard procedure Ada.Unchecked_Deallocation. As the name implies, there is no check made that the memory is actually free and that you don’t still have an access variable pointing to it; it’s entirely your responsibility to ensure that once you’ve used Unchecked_Deallocation to get rid of something, you never try to refer to it again. If you do the result will be unpredictable and might well crash your program.

                                    (While the above quote refers to Ada 95, my understanding is that this is still accurate for Ada 2012 and newer.)

                                    There are high-assurance subsets of Ada but they, like most high-assurance applications, do not support dynamic memory allocation. This does look to be changing though! It sounds like there’s work to bring “rust-style memory management” to Ada, so this comparison will hopefully be incorrect soon!

                                    1. 2

                                      rust has decent C FFI, how about ada?

                                      1. 3

                                        It’s part of the language standard. I’ve done it a bunch of times for various things and it’s super easy. If you don’t want to write the binding from scratch, GCC can generate you a basic thin binding from headers that you can improve on.