Spark

How to create Spark Dataframe on HBase table[Code Snippets]

how-to-create-spark-dataframe-on-hbase-table-24tutorials

There is no direct library to create Dataframe on HBase table like how we read Hive table with Spark sql.
This post gives the way to create dataframe on top of Hbase table.

You need to add hbase-client dependency to achieve this. Below is the link to get the dependency.
https://mvnrepository.com/artifact/org.apache.hbase/hbase-client/2.1.0

Lets say the hbase table is ’emp’ with rowKey as ’empID’ and columns are ‘name’ and ‘city’ under the column-family named – ‘metadata’. Case class -EmpRow is used in order to give the structure to the dataframe.
newAPIHadoopRDD is the API available in Spark to create RDD on hbase, configurations need to passed as shown below.

Dataframe will be created when you parse this RDD on case class.

 

Share This Post

An Ambivert, music lover, enthusiast, artist, designer, coder, gamer, content writer. He is Professional Software Developer with hands-on experience in Spark, Kafka, Scala, Python, Hadoop, Hive, Sqoop, Pig, php, html,css. Know more about him at www.saikumar.me

Lost Password

Register