Short article, but decent substance and useful links. I’d heard of Storm, but not Spark, and it looks like it might be worthwhile for something I’m working on.
[Comment removed by author]
Spark is implemented in Scala, but you don’t have to know Scala in order to use it. Right now, you can write Spark jobs in Scala, Java, or Python (source: https://spark.incubator.apache.org/). I assume more languages will be supported later.
The interesting part to me is the lack of complexity it has. It doesn’t deal with data replication - it let’s HDFS do that. It doesn’t deal with replication during calculations - it just recalculates from the last good checkpoint in case of server failure. It seems to add a minimal amount of complexity on top of existing technologies. I also think Scala is part of that. If you use Scala well, you can significantly reduce the complexity. (Granted, you can also massively complect a simple idea, so there’s that too). This is probably why we’re seeing a big increase in major Apache projects written in Scala.