I’m not quite so sure about competition generating “good” code – certainly good at maximizing metrics, but I’d be rather afraid of any of my competitive Shenzhen I/O solutions being actually implemented!
For example, one of them literally relies on the subtle behavior of logic gates to produce an ad hoc flip-flop to store data between time units. It beats Laser Tag in 8 yuan / 178 power / 6 lines of code – I have yet to see anyone I know even get close – but it’s complete hyperoptimized flaming garbage. And I’m slightly doubtful it’d actually be provably correct in all cases, too.
This is at least slightly less of a problem than in TIS-100, where you could implement actually-wrong solutions that were probabilistically correct and keep running the tests until they passed.
A good caution on using metrics to try to create “best” sorting algorithms (the example given in the article) is TimSort, which was used for years in Java, Python, and countless other implementations…. only to be proven formally incorrect a few years ago: http://www.envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/ . Remember, test cases cannot prove a program correct: they can only show that it works in a finite set of cases.