This is a spectacular piece of work. Some choice quotes:
At this point, I took a two-week-long break from this project to write CortexEmu.
Add to that the fact that Cortex-M3 compatible linux is slow AND huge, it was simply not an option. So, what is?
I ended up writing my own kernel.
I wrote an ARM emulator that would read each instruction and execute it, until the code exited ARM mode, at which point I’d exit the emulation and resume native execution.
C compilers kind of suck at optimizing emulator code. So what next? Well, I went ahead and rewrote the emulator core in assembly. Actually I did it twice.
if PC points to external RAM, and WFI instruction is executed (to wait for interrupts in a low power mode), and then an interrupt happens after more than 60ms, the CPU will take a random interrupt vector instead of the correct one after waking up!
The hardware has only a single one-byte buffer. This means that to not lose any received data at high data rates, one needs to use hardware flow control or make the serial port interrupt the highest priority and hope for the best. This was unacceptable for me. I decided to use DMA.
This is a spectacular piece of work. Some choice quotes: