Question-11: What is Hive Metastore?

Answer: Metastore is one of the RDBMS, which is required for Hive to work. It could be MySQL, PostGreSQL, Oracle etc. Usually metastore has following information (metadata) stored

  • Name of the tables
  • Columns in the table
  • Partition information
  • Hadoop specific information e.g. Data Files and their block locations.

 

Question-12: Can Hive metastore used by other Hadoop components?

Answer: Yes, Hive metastore contains the information regarding data stored on HDFS, so that other Hadoop components like Impala can leverage that. Even if you don't have Hive then also this Metastore would be used. 

 

Question-13: What do you mean by Remote Mode of Metastore?

Answer: Remote mode means metastore should be running in its separate JVM process. And any other process which wanted to get connected with the Metastore for example HiveServer2, HCatalog, Impala etc. should use the Thrift network API. 

 

Question-14: What is HiveServer2?

Answer: HiveServer2 is a server-side interface, you can assume it as a container for the Hive Execution Engine. For each client connection it creates a new execution context for Hive SQL request submitted by the client. Hive support for both JDBC and ODBC client, which uses the Thrift API. 

 

Question-15: Can Hive use the Apache Spark as a computation engine?

Answer: Yes, traditionally Hive using MapReduce as a computation engine, but Spark is much faster than MapReduce, hence in all modern solution Hive mostly uses the Spark as computation engine.