Serializer and Deserializer in Hive
What is SerDe? The record parsing of a Hive table is handled by a serializer/deserializer or SerDe for short. Hive uses the SerDe interface for IO. The Hive deserializer converts record (string or binary) into a java object that Hive can process (modify). Now, the Hive serializer will take this Java object, convert it into suitable format that can be stored into HDFS. So, basically a serde is responsible for converting the record bytes into something that can be used by Hive. HDFS files –> InputFileFormat –> <key, value> –> Deserializer –> Row object (Java object) Row object –> Serializer –> <key, value> –> OutputFileFormat –> HDFS files (Java object) Why we need SerDe? If we need to handle/load the semi structured data we can go with this case. If we have unstructured data, then we use RegEx SerDe which will instruct hiv...