These tutorials are great for demystifying kernel programming to an extent - people seem to think kernel code is beyond the comprehension of mere mortals - and I understand the desire to keep the example code short and simple. However, in this day and age, you’d think there would at least be a mention of the glaring security vulnerabilities the example code rips into your system if you load it, even if the chooses to skip the fix/leave it as an exercise to the reader.
(One of the “hard” parts of kernel programming is that everything must be assumed multithreaded unless proven otherwise. If 2 processes or threads race the write operation to the device node in the sample, you end up with a heap buffer overflow and/or use-after-free, for example.)
people seem to think kernel code is beyond the comprehension of mere mortals
I certainly had that mental block for a long time. Realising that a kernel was just a bit of (mostly) C code that ran with the same C abstract machine as userspace was a big mental jump to make. These days, it’s often C++ or Rust code and so at a higher level of abstraction than the userspace software that I grew up writing for systems like DOS. In NetBSD, it can even be Lua.
There are still a few extra challenges compared to modern userspace systems programming, besides the “high stakes” aspect:
You don’t generally have virtual memory to fall back on, and you need to be aware of the consequences of blocking memory allocations. (Allocating kernel memory may cause swapping out user memory, which will trigger disk I/O, which makes storage driver code run, which hopefully can complete the I/O without allocating kernel memory, or we’re deadlocked.)
Just about everything has to be thread safe.
Typically you only have access to a much-reduced standard library for your programming language.
You don’t generally have virtual memory to fall back on, and you need to be aware of the consequences of blocking memory allocations. (Allocating kernel memory may cause swapping out user memory, which will trigger disk I/O, which makes storage driver code run, which hopefully can complete the I/O without allocating kernel memory, or we’re deadlocked.)
I’m not quite sure what it means to not have virtual memory to fall back on. Most kernels run with paging enabled and so you can do VM tricks like page aliasing if you really want to. You can also do fun things such as per-CPU mappings (see our upcoming Oakland paper for some nice security wins from that in Xen and Hyper-V).
You’re right that you have to be a lot more careful about memory allocation. I’d say that that’s a property of systems programming in general, rather than something intrinsic to kernel mode. It’s pretty common for low-level C++ code to use custom allocators (either via the std::allocator abstraction or by explicitly overriding operator new / operator delete per class). This is one of the reasons that I consider C++ a better language than C for kernel development.
The main difference here is in your next point:
Just about everything has to be thread safe.
That’s true for most modern C/C++ code as well but the main difference is that everything has to be interrupt safe. Specifically, it has to either run with preemption disabled (which means that it can’t hold any sleepable locks or call anything that acquires sleepable locks), or run with them enabled and ensure that it doesn’t ever block the progress of other threads.
This comes up in memory allocation a lot. Typically kernels have blocking and non-blocking memory allocators (typically they have a lot more variants, though Linux is particularly limited in this regard). A blocking memory allocator will sleep the calling thread until there is memory available. Doing this with any locks held will potentially deadlock the entire system. The alternative is an allocator that can fail, which won’t deadlock but does require that you handle the failure case. In userspace, it’s far more common to assume allocation will succeed and just fail if it doesn’t. In particular, the things that cause blocking in the allocator very rarely depend on any other code in the userspace program. In kernel code, it’s quite possible that an allocation won’t be able to make forward progress until it does something with a resource that’s protected by a lock that you’re holding. That won’t happen in userspace.
The other difference is that kernel stacks tend to be very small (4-16KiB) and so creating a kernel thread is a lot cheaper than creating a userspace thread (where stacks are usually 2-8 MiB and also have a load of kernel state associated with them). This means that you have a lot more threads. Last time I paid attention while doing some kernel hacking, my FreeBSD kernel had 900 threads immediately after boot. This kind of number is common in userspace languages with N:M threading (Java, Go, and so on), but much less common for C/C++.
Typically you only have access to a much-reduced standard library for your programming language.
That’s far less true than it used to be. You probably don’t have file I/O or locale support, but you probably don’t want them either. If you work on the Solaris kernel, you have basically the same C library as userspace, which I find somewhat terrifying. In FreeBSD, libsa (kernel and bootloader) contains the subset of the C standard library that I’d want to use in these situations.
In some places, you have a far richer set of library functions. For example, the set of locks available in the FreeBSD kernel is far nicer than the set in userspace, and (when you compile in a debug mode) they will dynamically detect lock-order reversals. Kernel module loading and unloading hooks are more pleasant to work with than dlopen in userspace.
These tutorials are great for demystifying kernel programming to an extent - people seem to think kernel code is beyond the comprehension of mere mortals - and I understand the desire to keep the example code short and simple. However, in this day and age, you’d think there would at least be a mention of the glaring security vulnerabilities the example code rips into your system if you load it, even if the chooses to skip the fix/leave it as an exercise to the reader.
(One of the “hard” parts of kernel programming is that everything must be assumed multithreaded unless proven otherwise. If 2 processes or threads race the
write
operation to the device node in the sample, you end up with a heap buffer overflow and/or use-after-free, for example.)I certainly had that mental block for a long time. Realising that a kernel was just a bit of (mostly) C code that ran with the same C abstract machine as userspace was a big mental jump to make. These days, it’s often C++ or Rust code and so at a higher level of abstraction than the userspace software that I grew up writing for systems like DOS. In NetBSD, it can even be Lua.
There are still a few extra challenges compared to modern userspace systems programming, besides the “high stakes” aspect:
I’m not quite sure what it means to not have virtual memory to fall back on. Most kernels run with paging enabled and so you can do VM tricks like page aliasing if you really want to. You can also do fun things such as per-CPU mappings (see our upcoming Oakland paper for some nice security wins from that in Xen and Hyper-V).
You’re right that you have to be a lot more careful about memory allocation. I’d say that that’s a property of systems programming in general, rather than something intrinsic to kernel mode. It’s pretty common for low-level C++ code to use custom allocators (either via the
std::allocator
abstraction or by explicitly overridingoperator new
/operator delete
per class). This is one of the reasons that I consider C++ a better language than C for kernel development.The main difference here is in your next point:
That’s true for most modern C/C++ code as well but the main difference is that everything has to be interrupt safe. Specifically, it has to either run with preemption disabled (which means that it can’t hold any sleepable locks or call anything that acquires sleepable locks), or run with them enabled and ensure that it doesn’t ever block the progress of other threads.
This comes up in memory allocation a lot. Typically kernels have blocking and non-blocking memory allocators (typically they have a lot more variants, though Linux is particularly limited in this regard). A blocking memory allocator will sleep the calling thread until there is memory available. Doing this with any locks held will potentially deadlock the entire system. The alternative is an allocator that can fail, which won’t deadlock but does require that you handle the failure case. In userspace, it’s far more common to assume allocation will succeed and just fail if it doesn’t. In particular, the things that cause blocking in the allocator very rarely depend on any other code in the userspace program. In kernel code, it’s quite possible that an allocation won’t be able to make forward progress until it does something with a resource that’s protected by a lock that you’re holding. That won’t happen in userspace.
The other difference is that kernel stacks tend to be very small (4-16KiB) and so creating a kernel thread is a lot cheaper than creating a userspace thread (where stacks are usually 2-8 MiB and also have a load of kernel state associated with them). This means that you have a lot more threads. Last time I paid attention while doing some kernel hacking, my FreeBSD kernel had 900 threads immediately after boot. This kind of number is common in userspace languages with N:M threading (Java, Go, and so on), but much less common for C/C++.
That’s far less true than it used to be. You probably don’t have file I/O or locale support, but you probably don’t want them either. If you work on the Solaris kernel, you have basically the same C library as userspace, which I find somewhat terrifying. In FreeBSD, libsa (kernel and bootloader) contains the subset of the C standard library that I’d want to use in these situations.
In some places, you have a far richer set of library functions. For example, the set of locks available in the FreeBSD kernel is far nicer than the set in userspace, and (when you compile in a debug mode) they will dynamically detect lock-order reversals. Kernel module loading and unloading hooks are more pleasant to work with than
dlopen
in userspace.