Parallel processing sometimes requires different intuitions than the (now canonical) analysis of (sequential) algorithms can provide. Often the most relevant constraints have to do with locality and communication rather than raw “number of operations”. We knew this way back in the early seventies. Then, computing grew tremendously during a long era of mostly sequential architecture, and much knowledge was lost. Now, we’re all working with parallel computers at multiple scales, from SIMD instructions and speculative execution to multi-core CPUs to datacenters… but mostly still using the calcified methodologies and intuitions appropriate to a simpler era, as though they were immutable physical laws rather than handy heuristics. No wonder we’re constantly over-complicating things.