How to Create an Spark RDD?


RDDs can be created in two ways:
1)Transforming an existing RDD.
2)From a SparkContext or SparkSession object.

– Transforming an existing RDD:
When map called on List, it returns a new List. Similarly, many higher-order functions defined on RDD returns a new RDD.

– From a SparkContext (or SparkSession) object:
The SparkContext object (renamed SparkSession) can be thought of as your handle to the Spark cluster. It represents the connection between the Spark cluster and your running application. It defines a handful of methods which can be used to create and populate a new RDD:

a)parallelize: convert a local Scala collection to an RDD.
ex:- val rdd= sc.parallelize(Seq("1","2","3"))
b)textFile: read a text file from HDFS or a local file system and return an RDD of String.
ex:-val rdd= sc.textfile("/users/guest/read.txt")

Share This Post

An Ambivert, music lover, enthusiast, artist, designer, coder, gamer, content writer. He is Professional Software Developer with hands-on experience in Spark, Kafka, Scala, Python, Hadoop, Hive, Sqoop, Pig, php, html,css. Know more about him at

Lost Password


Do NOT follow this link or you will be banned from the site!