That’s not in armv6l, apparently. So yeah, this compiler is effectively cross-compiling for armv7 (or something) by default. That’s not very useful.
How and why did it get to that point? I can only imagine it’s some default that got bumped from version 11 to version 12, and somehow nobody noticed? I guess nobody still runs these old things anywhere?
Clang is always a cross compiler. It is just sometimes cross compiling for the same platform as the current host. It does not know anything about the host, it knows only about the default target. This is set when you configure LLVM and, by default, will be whatever the host is.
I don’t think the original RPi has enough RAM to build LLVM, and is almost certainly too slow to build it in a reasonable time. I used to build LLVM on a Cortex A8 (a few generations newer than the ARM11 core in the original RPi) and it took many hours. That was almost 10 years ago and now LLVM is a lot bigger. I’d expect it to be compiled on a bigger Arm machine (possibly in the 32-bit compat mode on a big 64-bit machine), which means it probably picks up the target from there. Or it’s cross compiled and no one set the correct target triple.
The RPi is awful to support. It’s about the only platform to still use ARM11, which means you’re stuck with ARMv6. Everything else that shipped at about the same time was a Cortex-A8 or similar, which supported ARMv7. The big difference was Thumb-2 support. Later ARMv6 (including the ARM11 variant in the RPi) have Thumb 1, but it’s pretty horrible. You have a choice of Thumb mode or Arm mode. Both are fixed-width ISAs. Arm uses 32-bit instructions and so often generates big code, Thumb is 16-bit and so is often smaller, but doesn’t have access to all registers or the full ISA and so is sometimes painful to target (and this can lead to bigger code). Thumb-2 fixed this and provided a variable-width ISA where common instructions were 16-bit and all of the others were usable with 32-bit variants. It managed this by rearranging the encoding from the original AArch32 design, where most instructions were predicated. Thumb-2 removes all predication from individual instruction encoding but introduces an if-then-else instruction that controls the predication for a bundle of instructions after it.
The introduction of Thumb-2 meant that most software just targeted Thumb-2 mode. It was always at least as dense as the fixed-width 32-bit encoding (except in incredibly contrived cases).
FreeBSD 14 dropped support for ARMv6 and 15 is likely to drop support for all 32-bit kernels (not 32-bit compat modes for userland). The cost of maintaining it is very high.
I don’t think the original RPi has enough RAM to build LLVM
But of course 512 MB is more than enough to run LLVM from a package, to build medium-complexity programs. 64 MB RAM is still enough for typical student programs (well it is with gcc, I haven’t tried clang on such a machine for a while)
Thumb is 16-bit and so is often smaller, but doesn’t have access to all registers
Point of order: Thumb of course implicitly uses PC, LR, SP in the normal way. Those and the other five hi registers also work with mov, add, cmp, and bx, which can need a little bit of planning, but you can certainly get a lot of use from r8-r12.
and this can lead to bigger code
I’ve never seen Thumb (T16) result in bigger code than A32 over a complete function [1] of typical size, if the functionality is available in Thumb at all. I’m sure you can make up some 2-instruction example [2], or even ARM’s Euclid GCD function (heavily using predication) they touted so much in the 80s, but not on typical code. Normally the worst that happens is that some A32 instructions become two T16 instructions, which is slower but the same size, while the rest of the function gets smaller.
I used to do a lot of work on ARM7TDMI and the correct strategy was absolutely to compile almost all your code for Thumb, and hand-pick a few functions to compile for ARM, for speed, not size. And at that point the ARM functions were probably asm not C anyway.
[1] which is, after all, the only way you can mix T16 and A32
[2] even if the first instruction of your two instruction A32 function expands to three T16 instructions, you’re still breaking even in size with Thumb, since the ret saves 2 bytes.
Nice! I have an ARMv6 pi zero w that I use as a benchmarking machine.
I fixed an ARMv6 bug in LLVM a while ago. If they don’t have an ARMv6 machine as part of their CI, stuff like this will just keep happening.
When I built LLVM on my pi zero (locked at 700mhz), it took a few days to complete, so a CI job would have to be configured to run quite rarely, but still… This hardware is not hard to find, and it’s dirt cheap, so it’s surely worth doing.
Looking this up, there are a bunch of screwy dead-end forum posts where people go back and forth asserting this package is installed and that’s making the compiler go stupid, or it’s because they did the “lite” install vs. the “recommended” install, or who knows what.
Another instance of the RPi community content ecosystem being really rough
What’s dumb here is not so much emitting an A32 instruction that is not in ARMv6 [1] but in 2023 still defaulting to A32 rather than T32 in the first place.
But the whole thing is kind of silly as the headline should be “clang in Debian now DEFAULTS to making …”.
Which you can fix, as the article itself says, just by telling it what ISA you need.
[1] though that is dumb, as the entire reason to use A32 it to be compatible with original Pi and Pi Zero.
Much appreciated! I ran into this recently… not on an original Pi Zero but rather on a first-gen Nvidia Jetson Nano. Very similar issue and I suspect this will be solution to the problem. Unfortunately the issue was in a Python pre-compiled wheel, so I guess I’ll be recompiling that from source.
It looks like the Jetson Nano uses Cortex A57 cores. These are 64-bit and over a decade newer than the original RPi / Pi Zero. The original RPi was introduced 11 years ago and the chip was basically free because it was so old that Broadcom couldn’t sell it to anyone. The Jetson Nano was released four years ago. The A57 was a first-generation AArch64 core (so it now almost a decade old) and had a few painful errata, but it’s still pretty close to the ISA that modern system support. The most likely problem is that it’s too old to support the newer atomic read-modify-write instructions. Targeting ARMv8.0 should just work though.
Yeah, it wasn’t the exact same issue but a similar vein. Illegal Instruction errors because whatever compiler was used to build the binary wheels was using instructions that the CPU didn’t support. I didn’t dive into it as deep as the OP did, but I did confirm with gdb that the CPU definitely didn’t support whatever the instruction was.
Edit: in the larger picture I gave up on the project entirely because the Nano is also stuck on Jetpack 4 which doesn’t support newer versions of CUDA and it seemed like I was going to be trying to find an exact old version of PyTorch that would both support the model I was trying to run and the version of CUDA I had. Didn’t seem worth it for a goofy weekend project.
Clang is always a cross compiler. It is just sometimes cross compiling for the same platform as the current host. It does not know anything about the host, it knows only about the default target. This is set when you configure LLVM and, by default, will be whatever the host is.
I don’t think the original RPi has enough RAM to build LLVM, and is almost certainly too slow to build it in a reasonable time. I used to build LLVM on a Cortex A8 (a few generations newer than the ARM11 core in the original RPi) and it took many hours. That was almost 10 years ago and now LLVM is a lot bigger. I’d expect it to be compiled on a bigger Arm machine (possibly in the 32-bit compat mode on a big 64-bit machine), which means it probably picks up the target from there. Or it’s cross compiled and no one set the correct target triple.
The RPi is awful to support. It’s about the only platform to still use ARM11, which means you’re stuck with ARMv6. Everything else that shipped at about the same time was a Cortex-A8 or similar, which supported ARMv7. The big difference was Thumb-2 support. Later ARMv6 (including the ARM11 variant in the RPi) have Thumb 1, but it’s pretty horrible. You have a choice of Thumb mode or Arm mode. Both are fixed-width ISAs. Arm uses 32-bit instructions and so often generates big code, Thumb is 16-bit and so is often smaller, but doesn’t have access to all registers or the full ISA and so is sometimes painful to target (and this can lead to bigger code). Thumb-2 fixed this and provided a variable-width ISA where common instructions were 16-bit and all of the others were usable with 32-bit variants. It managed this by rearranging the encoding from the original AArch32 design, where most instructions were predicated. Thumb-2 removes all predication from individual instruction encoding but introduces an if-then-else instruction that controls the predication for a bundle of instructions after it.
The introduction of Thumb-2 meant that most software just targeted Thumb-2 mode. It was always at least as dense as the fixed-width 32-bit encoding (except in incredibly contrived cases).
FreeBSD 14 dropped support for ARMv6 and 15 is likely to drop support for all 32-bit kernels (not 32-bit compat modes for userland). The cost of maintaining it is very high.
So it sounds like this is a configuration error in the Clang package for Raspbian or whatever the OS is called these days?
But of course 512 MB is more than enough to run LLVM from a package, to build medium-complexity programs. 64 MB RAM is still enough for typical student programs (well it is with gcc, I haven’t tried clang on such a machine for a while)
Point of order: Thumb of course implicitly uses PC, LR, SP in the normal way. Those and the other five hi registers also work with
mov,add,cmp, andbx, which can need a little bit of planning, but you can certainly get a lot of use from r8-r12.I’ve never seen Thumb (T16) result in bigger code than A32 over a complete function [1] of typical size, if the functionality is available in Thumb at all. I’m sure you can make up some 2-instruction example [2], or even ARM’s Euclid GCD function (heavily using predication) they touted so much in the 80s, but not on typical code. Normally the worst that happens is that some A32 instructions become two T16 instructions, which is slower but the same size, while the rest of the function gets smaller.
I used to do a lot of work on ARM7TDMI and the correct strategy was absolutely to compile almost all your code for Thumb, and hand-pick a few functions to compile for ARM, for speed, not size. And at that point the ARM functions were probably asm not C anyway.
[1] which is, after all, the only way you can mix T16 and A32
[2] even if the first instruction of your two instruction A32 function expands to three T16 instructions, you’re still breaking even in size with Thumb, since the
retsaves 2 bytes.Wouldn’t this be fixed by passing
-march=armv6to clang?It is, and the post says so.
Pretty sure it didn’t when I read the post 😅
Nice! I have an ARMv6 pi zero w that I use as a benchmarking machine.
I fixed an ARMv6 bug in LLVM a while ago. If they don’t have an ARMv6 machine as part of their CI, stuff like this will just keep happening.
When I built LLVM on my pi zero (locked at 700mhz), it took a few days to complete, so a CI job would have to be configured to run quite rarely, but still… This hardware is not hard to find, and it’s dirt cheap, so it’s surely worth doing.
Another instance of the RPi community content ecosystem being really rough
https://lobste.rs/s/erdn2p
What’s dumb here is not so much emitting an A32 instruction that is not in ARMv6 [1] but in 2023 still defaulting to A32 rather than T32 in the first place.
But the whole thing is kind of silly as the headline should be “clang in Debian now DEFAULTS to making …”.
Which you can fix, as the article itself says, just by telling it what ISA you need.
[1] though that is dumb, as the entire reason to use A32 it to be compatible with original Pi and Pi Zero.
Much appreciated! I ran into this recently… not on an original Pi Zero but rather on a first-gen Nvidia Jetson Nano. Very similar issue and I suspect this will be solution to the problem. Unfortunately the issue was in a Python pre-compiled wheel, so I guess I’ll be recompiling that from source.
It looks like the Jetson Nano uses Cortex A57 cores. These are 64-bit and over a decade newer than the original RPi / Pi Zero. The original RPi was introduced 11 years ago and the chip was basically free because it was so old that Broadcom couldn’t sell it to anyone. The Jetson Nano was released four years ago. The A57 was a first-generation AArch64 core (so it now almost a decade old) and had a few painful errata, but it’s still pretty close to the ISA that modern system support. The most likely problem is that it’s too old to support the newer atomic read-modify-write instructions. Targeting ARMv8.0 should just work though.
Yeah, it wasn’t the exact same issue but a similar vein. Illegal Instruction errors because whatever compiler was used to build the binary wheels was using instructions that the CPU didn’t support. I didn’t dive into it as deep as the OP did, but I did confirm with gdb that the CPU definitely didn’t support whatever the instruction was.
Edit: in the larger picture I gave up on the project entirely because the Nano is also stuck on Jetpack 4 which doesn’t support newer versions of CUDA and it seemed like I was going to be trying to find an exact old version of PyTorch that would both support the model I was trying to run and the version of CUDA I had. Didn’t seem worth it for a goofy weekend project.
We have half a dozen of these cluttering up our office and I wasn’t sure what to do with them
we run the website of our hackers youth club on one of those. From SD-card. The one from 2012 :-)
I’ll take some off your hands! I’ll pay shipping!