1. 7

  2. 2

    The Tiny Code Generator in QEMU comes from the Tiny C Compiler, which began life as Fabrice’s entry into the International Obfuscated C Competition. The entry was a self-hosting C compiler in 2 KiB of source code. A few years back, TCC was able to compile Linux (with some modifications) and there was a fun demo where it was embedded in GRUB and would compile the kernel into memory on boot (it compiled a 2.6 series kernel in about 20 seconds on a 200MHz machine). I wish it would be separated out of QEMU: a portable, fast, permissively licensed (QEMU as GPL’d, but the TCG parts are MIT) JIT library would be a great companion to LLVM. TCG is fast enough that you can use it instead of an interpreter and so would be great for first-tier JITs that then have something more powerful (and higher overhead) for the hot code paths.

    1. 1

      It’s funny to me TCG has this pedigree - I’ve had nothing but horrible experiences with it. It’s apocalyptically slow - x86 emulation with TCG tends to be in the ballpark of cycle accurate emulators like PCem, if not worse! Perhaps that’s more the fault of QEMU than TCG though…

      1. 2

        There probably hasn’t been much work put into making it fast for x86 because qemu mostly runs with x86 as the host. There are two ways that you can use TCG in qemu:

        • Use it to accelerate the dispatch, but keep the instruction implementation in C
        • Use it to fully implement the instruction.

        The former is faster than a pure interpreter (you decode once and then chain together a bunch of call instructions for a basic block in the emulated ISA) but your performance is still limited by the performance of the C code. This is pretty easy to do though and so gets a modest speedup for very little effort. The second is more effort because you have to effectively hand-compile the code.

        Periodically people look at using LLVM for QEMU. This would give better code fairly easily by compiling the C functions that implement instructions to IR and then inlining them into the dispatcher. The rest of the QEMU architecture isn’t really set up for this though, because it looks at things one instruction or one basic block at a time, and for the fixed performance costs of LLVM to be worthwhile you need to give it a lot more code to work with. The highest-performance LLVM-based emulator that I’m aware of compiles a page of code at a time and recompiles it every time it finds a new entry point to that page (from someone jumping into the page from elsewhere). It’s hard to see how you’d retrofit that to QEMU though.

        1. 1

          The other emulation targets seem slightly better CPU wise, but not by much - I can get about ~600 MHz G3 performance out of qemu-system-powerpc emulating a Power Mac, with a Ryzen host. Most QEMU targets don’t work that well though; they have very bad NetBSD syndrome where it claims it supports a lot of things, but most outside a core few to a bare-minimum level (i.e it can only boot Linux) no one cares about.