RDDs can be created in two ways:
1)Transforming an existing RDD.
2)From a SparkContext or SparkSession object.
– Transforming an existing RDD:
When map called on List, it returns a new List. Similarly, many higher-order functions defined on RDD returns a new RDD.
– From a SparkContext (or SparkSession) object:
The SparkContext object (renamed SparkSession) can be thought of as your handle to the Spark cluster. It represents the connection between the Spark cluster and your running application. It defines a handful of methods which can be used to create and populate a new RDD:
a)parallelize: convert a local Scala collection to an RDD.
b)textFile: read a text file from HDFS or a local file system and return an RDD of String.