I think, fundamentally, Hadoop really isn’t intended for scientific HPC. It’s designed to do incredibly stupid analysis over huge volumes of data.
If you want to do complex analysis, like most scientists do, Mapreduce is a terrible framework for writing it. Anything beyond the most basic analysis is painful to the extreme. You can just forget about complex modeling.
If your data volumes are less than at least a couple TB, Hadoop’s latency sacrifices completely outweigh its throughput.
It is, very much, built specifically to serve the sort of high-volume, completely uninteresting analysis that non-R&D corporations do. It might be possible to shoehorn it into some scientific problem domains; but I’m pretty sure in general it’s not a good fit.
I think, fundamentally, Hadoop really isn’t intended for scientific HPC. It’s designed to do incredibly stupid analysis over huge volumes of data.
It is, very much, built specifically to serve the sort of high-volume, completely uninteresting analysis that non-R&D corporations do. It might be possible to shoehorn it into some scientific problem domains; but I’m pretty sure in general it’s not a good fit.
Hadoop serialization boundaries also impose a phenomenal cost on multistage jobs; not exactly ideal for high-performance computing.