Re: figuring out code size: a really helpful tool in this regard (in my experience) is the linker map (-Wl,-Map=outfile.map), but it’s very very verbose, but also detailed.
If you don’t need the features, -fno-stack-protector-fomit-frame-pointer-fno-unwind-tables-fno-asynchronous-unwind-tables-fno-rtti-fno-exceptions-fno-threadsafe-statics can also yield a few gains. If you’re doing floating point stuff and have an FPU available, make sure it’s actually being used: -marm -mfloat-abi=hard -march=armv7-a+neon+vfpv3, otherwise the compiler will generate softfloat code.
Also, if you can afford it, rewrite the crt* code and reimplement a few stdlib functions that can be made much smaller (eg. the newlib strcmp seems to be made for performance and thus needs to do some unaligned access magic, but it can also be done with a simple ldrb/cmp loop).
Re: figuring out code size: a really helpful tool in this regard (in my experience) is the linker map (-Wl,-Map=outfile.map), but it’s very very verbose, but also detailed.
If you don’t need the features, -fno-stack-protector -fomit-frame-pointer ’-fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -fno-exceptions -fno-threadsafe-statics
In our case most of those are disabled by default since we’re writing C code on an ARM MCU (fno-stack-protector, fomit-frame-pointer, fno-rtti, fno-exceptions, fno-threadsafe-statics). I didn’t think to check whether the unwind tables ended up in my bin file, I’ll look into it! In any case, your suggestions make a lot of sense in many contexts.
If you’re doing floating point stuff and have an FPU available, make sure it’s actually being used:-marm -mfloat-abi=hard -march=armv7-a+neon+vfpv3`, otherwise the compiler will generate softfloat code.
I initially had a section about FPU code, but my test code targets the cortex-m0 which does not have an FPU. The next blog post in the series will talk about floating point code.
Also, if you can afford it, rewrite the crt* code and reimplement a few stdlib functions that can be made much smaller (eg. the newlib strcmp seems to be made for performance and thus needs to do some unaligned access magic, but it can also be done with a simple ldrb/`cmp loop).
I file this under “desperate measures”. There are smaller implementations of libc than newlib-nano (note we’re not using standard newlib), but they’re much less robust.
In our case most of those are disabled by default since we’re writing C code on an ARM MCU
I’ve also done stuff for an FPU-less ARM chip (good ol’ ARM7), but the toolchain still enabled these by default, so ¯\_(ツ)_/¯.
I file this under “desperate measures”. There are smaller implementations of libc than newlib-nano (note we’re not using standard newlib), but they’re much less robust.
True, but I’ve been exposed to places where things like these (and even worse) were needed.
Re: figuring out code size: a really helpful tool in this regard (in my experience) is the linker map (
-Wl,-Map=outfile.map
), but it’s very very verbose, but also detailed.If you don’t need the features,
-fno-stack-protector
-fomit-frame-pointer
-fno-unwind-tables
-fno-asynchronous-unwind-tables
-fno-rtti
-fno-exceptions
-fno-threadsafe-statics
can also yield a few gains. If you’re doing floating point stuff and have an FPU available, make sure it’s actually being used:-marm -mfloat-abi=hard -march=armv7-a+neon+vfpv3
, otherwise the compiler will generate softfloat code.Also, if you can afford it, rewrite the
crt*
code and reimplement a few stdlib functions that can be made much smaller (eg. the newlibstrcmp
seems to be made for performance and thus needs to do some unaligned access magic, but it can also be done with a simpleldrb
/cmp
loop).(EDIT: markup fix)
Cyril Fougeray wrote a post about the linker map file on Interrupt a few weeks ago FWIW: https://interrupt.memfault.com/blog/get-the-most-out-of-the-linker-map-file
In our case most of those are disabled by default since we’re writing C code on an ARM MCU (fno-stack-protector, fomit-frame-pointer, fno-rtti, fno-exceptions, fno-threadsafe-statics). I didn’t think to check whether the unwind tables ended up in my bin file, I’ll look into it! In any case, your suggestions make a lot of sense in many contexts.
I initially had a section about FPU code, but my test code targets the cortex-m0 which does not have an FPU. The next blog post in the series will talk about floating point code.
I file this under “desperate measures”. There are smaller implementations of libc than newlib-nano (note we’re not using standard newlib), but they’re much less robust.
I’ve also done stuff for an FPU-less ARM chip (good ol’ ARM7), but the toolchain still enabled these by default, so ¯\_(ツ)_/¯.
True, but I’ve been exposed to places where things like these (and even worse) were needed.
[Comment removed by author]