Data in Hive is organised into –
- Databases – Namespace to separate table and other data
- Tables – Homogeneous collection of data having same schema
- Partitions – Divisions in table data based on key value
- Buckets – Divisions in partitions based on hash value of a particular column
Hive Data Types:
- Hive supports primitive data types and three collection types.
- Primitive type –
tinyint, smallint, int,
bigint, boolean, string,
timestamp, float, double , binary - Collection Types –
1. Struct
address struct <city:STRING; state:STRING>
– Eg: struct (‘Bengaluru’, ‘Karnataka’) and address.city = ‘Bengaluru’
2. Array
names array(‘Hari’, ’Sai’)
– Eg: name[1] = Sai
3. Maps
name map(‘first’, ‘Mahendra’, ‘last’, ‘Dhoni‘)
– Eg: name[‘first’] => Mahendra
4. Union
- All data types are implemented in Java
- Type casting of the data types are available as in Java