“lots of wimpy CPUs is just wimpy” is definitely context dependent. I did some work in HPC (a lot of C, C++, and Fortran programs may be considered “legacy” by some) and much involved the Xeon Phi CPUs. These are just “wimpy” 1.4 GHz cores, but with 64-72 cores along with loads of memory on each die, if your application needs lots of communication between parallel threads they aren’t particularly wimpy at all.
BlueGene and Cell were both comprised of “wimpy” cores and simultaneously very successful in certain HPC applications.
I think what architects learned from that, though, is that faster single-threaded performance is easier, and therefore more productive, for the vast majority of programmers.
yes, most desktop/mobile/web software is “wait for someone to press something then do the work for them”, until they get lots of customers when it’s “do this independent work for as many people as possible”. Those benefit from non-wimpy cores and don’t need great synchronisation.
Tilera has long been in that space, too. Graphics cards used general-purpose could probably count. Adapteva shipped some, too. Moore et all have their Forth chip for doing it low-energy.
It does depend on context, though.