Introduction to Hive – When, What, Why


  • At Facebook the data grew from GBs (2006) to 1 TB/day (2007) and today it is 500+ TBs per day
  • Rapidly grown data made traditional warehousing expensive
  • Scaling up vertically is very expensive
  • Hadoop is an alternative to store and process large data
  • But MapReduce is very low-level and requires custom code
  • Facebook developed Hive as solution
  • Sept 2008 – Hive becomes a Hadoop subproject

What is Hive –

  • Hive is a Data Warehouse solution built on Hadoop
  • It is a system for querying, managing and storing structured data on Hadoop
  • An infrastructure on Hadoop for summarization and analysis of data
  • Provides an SQL dialect called Hive QL to process data on Hadoop cluster
  • Hive translates HiveQL queries to Map Reduce Java APIs
  • Hive is not a full database
  • It does not provide record level insert, delete or update
  • Hive do not provide transactions
  • Hive queries have higher latency even for small data sets
  • Most suitable for moving traditional data warehouse applications

Why Hive –

  • Many low-level details to be managed for Jobs when executed on Hadoop Java APIs
  • Map Reduce programming is suitable for experienced Java Programmers
  • Hive provides the familiar programming model like SQL
  • Eliminates the need for writing complex code with Java
  • Hive coding is simple and one need not be an experienced programmer to code in Hive

Share This Post

An Ambivert, music lover, enthusiast, artist, designer, coder, gamer, content writer. He is Professional Software Developer with hands-on experience in Spark, Kafka, Scala, Python, Hadoop, Hive, Sqoop, Pig, php, html,css. Know more about him at

Lost Password


24 Tutorials