Spark

Why Scala? Why Spark?

Why Scala?

In general, Data Science and analytics is done in the small using R, Python, Matlab etc…
If your dataset gets too large to fit into memory, these languages/frameworks won’t allow scaling. You have to reimplement everything in some other language or system.

Now, the industry is shifting towards data-oriented decision making and many applications are Data science in the large.

By using a language like Scala. it’s easier to scale your small problem to the large with Spark, whose API is almost 1-to-1 with Scala’s collections.
That is, by working in Scala, in a functional style, you can quickly scale your problem from one node to tens, hundreds, or even thousands by leveraging Spark, successful and performant large-scale data processing framework which looks a and feels a lot like Scala Collections!

Why Spark?

Spark is…

#More expressive – APIs modeled after Scala collections. Look like functional lists! Richer, more composable operations possible than in MapReduce.

#Performant – Not only performant in terms of running time… But also in terms of developer productivity!
Interactive!

#Good for data science – Not just because of performance, but because it enables iteration, which is required by most algorithms in a data scientist’s toolbox.

And…….Spark/Scala are Top paying technologies in the industry currently as per the recent StackOverflow survey results.

Share This Post

Avatar
An Ambivert, music lover, enthusiast, artist, designer, coder, gamer, content writer. He is Professional Software Developer with hands-on experience in Spark, Kafka, Scala, Python, Hadoop, Hive, Sqoop, Pig, php, html,css. Know more about him at www.saikumar.me

Lost Password

Register