A statistical study of Bitcoin blockchain suggests a likely statistical bias in SHA256
A bias in SHA256 doesn’t look like the best explanation for what they saw.
It sounds like they’re talking about a bias observed in live Bitcoin blocks not anything they’ve tested and seen in raw SHA256 themselves, because 2^80 hashes is more like what the entire Bitcoin network has done than anything they can do themselves.
I’m sure someone has said this better elsewhere, but it sounds like that just means miners aren’t using entirely random nonces in their search–the same thing as in the difficulty-less-than-1,000 blocks, but in a subtler way. Maybe some miner’s code/chips just don’t scan all nonces. Maybe in some miner’s cluster, they use a node ID in some bits of the nonce to partition nonce space, and some sections of nonce space are effectively left aside for future capacity they might add. Maybe the miner tries every possible nonce eventually, but there’s an ordering thing where they’ll tend to try some parts of nonce space last, the ones corresponding to the nonces you less often see in blocks. I don’t know a ton about the block format, but if you can break out which blocks were generated by specific software or mining companies, you might be able to spot whose blocks have the strongest bias.
If there were a SHA256 bias that strong (a couple bits set 55% of the time instead of expected 50!) and that simple (basically linear, and shows up even without a sophisticated search for linear characteristics), and visible through 2xSHA256’s 128 rounds not just SHA256’s 64, then SHA256 itself should have profound biases you’d expect to show up when cryptographers search for linear characteristics. And to test this directly, you wouldn’t need 2^80 effort – find a few blocks at difficulty >1000 searching through nonces randomly, and you should be able to see the 55%-versus-50% difference with much less effort than the Bitcoin miners have expended. Dunno whether that would be near try-this-at-home levels, but it would be a lot less than 2^80.
I can see how if it all looks like a black box to you a problem in SHA256 would be an enticing theory, but the least surprising explanation by far is that it’s an artifact of miners searching nonrandomly, not of the hash being nonrandom in such a strong way.
Thanks a lot for the explanations.
I don’t have any feeling for the inner workings of cryptographic hash functions, only for reductions. I assumed Bitcoin does enough abuse in the way it applies hashing for a mild bias in this case not to contradict any well-studied properties of SHA256, I was wrong.
I was obviously wrong to put a summary only slightly less enthusiastic than the title instead of much less enthusiastic.
I didn’t believe that they found anything that would be an actual problem with SHA256 as a hash, but could we please have something non-catastrophic but noticeable (5% bias in Bitcoin mining would be nice, but not this time) to make people stop using random primitives as if they were perfect random oracles? I guess not this time. Sigh.