1. 7

  2. 5

    I think, fundamentally, Hadoop really isn’t intended for scientific HPC. It’s designed to do incredibly stupid analysis over huge volumes of data.

    • If you want to do complex analysis, like most scientists do, Mapreduce is a terrible framework for writing it. Anything beyond the most basic analysis is painful to the extreme. You can just forget about complex modeling.
    • If your data volumes are less than at least a couple TB, Hadoop’s latency sacrifices completely outweigh its throughput.

    It is, very much, built specifically to serve the sort of high-volume, completely uninteresting analysis that non-R&D corporations do. It might be possible to shoehorn it into some scientific problem domains; but I’m pretty sure in general it’s not a good fit.

    1. 3

      Hadoop serialization boundaries also impose a phenomenal cost on multistage jobs; not exactly ideal for high-performance computing.