Below are major components in Hive Architecture –
UI – The user interface for users to submit queries and other operations to the system. As of 2011 the system had a command line interface and a web based GUI was being developed.
Driver – Hive queries are sent to drivers for compilation, optimization and execution
Compiler – The component that parses the query, does semantic analysis on the different query blocks and query expressions and eventually generates an execution plan with the help of the table and partition metadata looked up from the metastore.
Metastore – System catalog which contains metadata about table schemas and other system schemas.
Stores in a separate DB – like MySQL
Execution Engine – The component which executes the execution plan created by the compiler. The plan is a DAG of stages. The execution engine manages the dependencies between these different stages of the plan and executes these stages on the appropriate system components.
Thrift Server – Allow clients to access Hive using languages like C++, Java, Ruby etc .
Optional thrift server is HiveServer or HiveThrift
Serializers/Deserializers ( SerDe)
– SerDe is the short name for “Serializer and Deserializer”
– Reading and writing of tables rows in Hive is done with SerDe
– Contains framework libraries which allow users to develop serializers and
deserializers for their own data formats
– It also has some built-in serialization/deserialization techniques
Query Processor – Processing framework implementation for the translation of HQL to map/reduce jobs