Data Model and Datatypes in Hive

Data  in Hive is organised into –

  • Databases –  Namespace to separate table and other data
  • Tables – Homogeneous collection of data having same schema
  • Partitions – Divisions in table data based on key value
  • Buckets – Divisions in partitions based on hash value of a particular column

Hive Data Types:

    • Hive supports primitive data types and three collection types.
    • Primitive type –
      tinyint,   smallint,  int,
      bigint,   boolean,   string,
      timestamp, float,   double ,  binary
    • Collection Types –

1. Struct
address struct <city:STRING; state:STRING>
– Eg: struct (‘Bengaluru’, ‘Karnataka’) and = ‘Bengaluru’

2. Array
names array(‘Hari’, ’Sai’)
– Eg: name[1] = Sai

3. Maps
name map(‘first’, ‘Mahendra’, ‘last’, ‘Dhoni‘)
– Eg: name[‘first’] => Mahendra

4. Union

  • All data types are implemented in Java
  • Type casting of the data types are available as in Java

