Funny to see this come up – I actually implemented something very similar to this with Linux a few years ago.
It used Andi Kleen’s kernel-LTO patch set as a starting point, and then used perf profiling to determine dynamically-hot codepaths, which were then recompiled with very aggressive optimizations and some auxiliary hint information to enable speculative devirtualization of indirect calls (even cross-module ones), resulting in a new, semantically-equivalent codepath with pretty much everything inlined together (from syscall entry points all the way down to device drivers), specialized for that workload on that system. That code was then built into a module and spliced into the running system via the livepatch mechanism.
At the time the results weren’t quite dramatic enough to justify pursuing it further (or maybe I wasn’t testing it on the right workloads), but with the advent of Spectre and such increasing the cost of indirect calls non-trivially I wonder if it might look better now…
Trading gold for lead. Two scoops of attack surface and complexity for speed when what we have is already quite fast.
It will happen though, since the gain is easily quantifiable, will make great charts, and everyone in the space will have to do it to keep up with the joneses.
Whereas the value of not having another JIT compiler inside your kernel will be much harder to quantify until the SHTF.
Existing JITs are problematic from a security perspective mostly because they are used to execute untrusted code in a trusted environment, which is not the case in the described scenario: kernel code is executed in the kernel like it always was, only with some specialization applied at runtime.
“because they are used to execute untrusted code in a trusted environment”
“kernel code is executed in the kernel like it always was”
Not the whole picture. The kernel might execute trusted code on malicious inputs passed through via compromised apps or attempts to compromise something. By itself, that can lead to a kernel attack. The JIT adds the possibility that the specialization process introduces vulnerabilities that weren’t there before. Both the AOT compiler and app might have gotten testing the user-specific JIT-ed code didn’t. So, you get one, extra layer of attack surface with the JIT. That they have to do less work than AOT compilers puts upper bound on their security in practice, too.
Adding a turing-complete JIT to a kernel for a little more performance is ill-advised. He mentions BPF, though that’s apples and oranges–BPF isn’t turing complete. And in this one phrase I count three tasks of the never-ending variety:
for deployment we’d probably end up either reusing a lighter-weight code generator or else creating a new one that is smaller, faster, and more suitable for inclusion in the OS. Performance of runtime code generation isn’t just a throughput issue, there’ll also be latency problems if we’re not careful. We need to think about the impact on security, too.
Reminds me of Snabb Switch, a Lua userspace networking framework that heavily relies on LuaJIT for its high performance. Interestingly Regehr also mentions that he sees the most potential for this technique in the network stack.
Funny to see this come up – I actually implemented something very similar to this with Linux a few years ago.
It used Andi Kleen’s kernel-LTO patch set as a starting point, and then used perf profiling to determine dynamically-hot codepaths, which were then recompiled with very aggressive optimizations and some auxiliary hint information to enable speculative devirtualization of indirect calls (even cross-module ones), resulting in a new, semantically-equivalent codepath with pretty much everything inlined together (from syscall entry points all the way down to device drivers), specialized for that workload on that system. That code was then built into a module and spliced into the running system via the livepatch mechanism.
At the time the results weren’t quite dramatic enough to justify pursuing it further (or maybe I wasn’t testing it on the right workloads), but with the advent of Spectre and such increasing the cost of indirect calls non-trivially I wonder if it might look better now…
Trading gold for lead. Two scoops of attack surface and complexity for speed when what we have is already quite fast.
It will happen though, since the gain is easily quantifiable, will make great charts, and everyone in the space will have to do it to keep up with the joneses.
Whereas the value of not having another JIT compiler inside your kernel will be much harder to quantify until the SHTF.
Existing JITs are problematic from a security perspective mostly because they are used to execute untrusted code in a trusted environment, which is not the case in the described scenario: kernel code is executed in the kernel like it always was, only with some specialization applied at runtime.
“because they are used to execute untrusted code in a trusted environment” “kernel code is executed in the kernel like it always was”
Not the whole picture. The kernel might execute trusted code on malicious inputs passed through via compromised apps or attempts to compromise something. By itself, that can lead to a kernel attack. The JIT adds the possibility that the specialization process introduces vulnerabilities that weren’t there before. Both the AOT compiler and app might have gotten testing the user-specific JIT-ed code didn’t. So, you get one, extra layer of attack surface with the JIT. That they have to do less work than AOT compilers puts upper bound on their security in practice, too.
That part was clear enough from the article.
Adding a turing-complete JIT to a kernel for a little more performance is ill-advised. He mentions BPF, though that’s apples and oranges–BPF isn’t turing complete. And in this one phrase I count three tasks of the never-ending variety:
Reminds me of Snabb Switch, a Lua userspace networking framework that heavily relies on LuaJIT for its high performance. Interestingly Regehr also mentions that he sees the most potential for this technique in the network stack.