I’m impressed that he was able to get anything out of Spark+HDFS on such little RAM. I spent a lot of the latter half of my undergrad trying to convince four machines with 16GB RAM to do anything useful with Spark… usually, even with huge amounts of data, processing it would take about the same amount of time as my regular desktop.
I guess all I can do is chalk it up to operator error. Maybe I’ll take a look at it for doing computing stuff in the future.
If the data you are processing is not really big, using something like Spark is an overkill and it will be slower than processing it in only one machine.