File Formats in Hive

  • File Format specifies how records are encoded in files
  • Record Format implies how a stream of bytes for a given record are encoded
  • The default file format is TEXTFILE – each record is a line in the file
  • Hive uses different control characters as delimeters in textfiles
    • ᶺA ( octal 001) , ᶺB(octal 002), ᶺC(octal 003), \n
  • The term field is used when overriding the default delimiter
  • Supports text files – csv, tsv
  • TextFile can contain JSON or XML documents.

Commonly used File Formats –

  1. TextFile format
    • Suitable for sharing data with other tools
    • Can be viewed/edited manually
  2. SequenceFile
    • Flat files that stores binary key ,value pair
    • SequenceFile offers a Reader ,Writer, and Sorter classes for reading ,writing, and sorting respectively
    • Supports – Uncompressed, Record compressed ( only value is compressed) and Block compressed ( both key,value compressed) formats
  3. RCFile
    • RCFile stores columns of a table in a record columnar way
  4. ORC
  5. AVRO

Share This Post

An Ambivert, music lover, enthusiast, artist, designer, coder, gamer, content writer. He is Professional Software Developer with hands-on experience in Spark, Kafka, Scala, Python, Hadoop, Hive, Sqoop, Pig, php, html,css. Know more about him at

Lost Password


Do NOT follow this link or you will be banned from the site!