Hadoop is amazingly fun to work with for a use case that’s well suited to it. You feel like a badass when you bend terabytes of RAM and thousands of cores to your will. (Yeah, I bet this statement won’t age well…) That allure pushes people to use it for things it’s not meant for.
Hadoop requires a lot of discipline to not fall back on the habits of traditional RDBMS tools. Hive looks an awful lot like regular SQL to end users, and they probably won’t even bother to tell you about their amazing plan to store their 10000 row table with constant single-row updates, inserts, and deletions in Hive. That is, they won’t bother to tell you until they’re complaining that it’s performing like garbage because it has 1-3 rows stored on each node of your multi-hundred node of your cluster.
Many people I’ve met who are now saying “I want to put my app in AWS” were saying “I want to put my app in Hadoop” a few years ago with an equal understanding of what it means: zero. The excitement over the shininess of the technology is completely overwhelming the practicality of using it. A key part of my job is to show people how the sausage is made, and give them the slightly more polite version of this talk.