Why Scala?
In general, Data Science and analytics is done in the small using R, Python, Matlab etc…
If your dataset gets too large to fit into memory, these languages/frameworks won’t allow scaling. You have to reimplement everything in some other language or system.
Now, the industry is shifting towards data-oriented decision making and many applications are Data science in the large.
By using a language like Scala. it’s easier to scale your small problem to the large with Spark, whose API is almost 1-to-1 with Scala’s collections.
That is, by working in Scala, in a functional style, you can quickly scale your problem from one node to tens, hundreds, or even thousands by leveraging Spark, successful and performant large-scale data processing framework which looks a and feels a lot like Scala Collections!
Why Spark?
Spark is…
#More expressive – APIs modeled after Scala collections. Look like functional lists! Richer, more composable operations possible than in MapReduce.
#Performant – Not only performant in terms of running time… But also in terms of developer productivity!
Interactive!
#Good for data science – Not just because of performance, but because it enables iteration, which is required by most algorithms in a data scientist’s toolbox.
And…….Spark/Scala are Top paying technologies in the industry currently as per the recent StackOverflow survey results.