Question-116: If you run hive as a server, what are the available mechanism for connecting it from application?

Answer: There are following ways by which you can connect with the Hive Server:

  1. Thrift Client:Using thrift you can call hive commands from a various programming language e.g. C++, Java, PHP, Python and Ruby.
  2. JDBC Driver :It supports the Type 4 (pure Java) JDBC Driver 3. ODBC Driver:It supports ODBC protocol.

Question-117: What is SerDe in Apache Hive?

Answer: A SerDe is a short name for a Serializer Deserializer. Hive uses SerDe (and FileFormat) to read and write data from tables. An important concept behind Hive is that it DOES NOT own the Hadoop File System (HDFS) format that data is stored in. Users are able to write files to HDFS with whatever tools/mechanism takes their fancy("CREATE EXTERNAL TABLE" or "LOAD DATA INPATH," ) and use Hive to correctly "parse" that file format in a way that can be used by Hive. A SerDe is a powerful (and customizable) mechanism that Hive uses to "parse" data stored in HDFS to be used by Hive.

Question-118. Which classes are used by the Hive to Read and Write HDFS Files?

Answer: Following classes are used by Hive to read and write HDFS files

  • TextInputFormat/HiveIgnoreKeyTextOutputFormat:These 2 classes read/write data in plain text file format.
  • SequenceFileInputFormat/SequenceFileOutputFormat:These 2 classes read/write data in hadoop SequenceFile format.  

Question-119: Give examples of the SerDe classes whihc hive uses to Serialize and Deserilize data ?

Answer: Hive currently use these SerDe classes to serialize and deserialize data:

  • MetadataTypedColumnsetSerDe:This SerDe is used to read/write delimited records like CSV, tab-separated control-A separated records (quote is not supported yet.)
  • ThriftSerDe:This SerDe is used to read/write thrift serialized objects. The class file for the Thrift object must be loaded first.
  • DynamicSerDe:This SerDe also read/write thrift serialized objects, but it understands thrift DDL so the schema of the object can be provided at runtime. Also it supports a lot of different protocols, including TBinaryProtocol, TJSONProtocol, TCTLSeparatedProtocol (which writes data in delimited records).

Question-120: How do you write your own custom SerDe?

Answer: In most cases, users want to write a Deserializer instead of a SerDe, because users just want to read their own data format instead of writing to it.

For example, the RegexDeserializer will deserialize the data using the configuration parameter 'regex', and possibly a list of column names

If your SerDe supports DDL (basically, SerDe with parameterized columns and column types), you probably want to implement a Protocol based on DynamicSerDe, instead of writing a SerDe from scratch. The reason is that the framework passes DDL to SerDe through "thrift DDL" format, and it's non-trivial to write a "thrift DDL" parser.