Question-21: What are all the possible clients for Impala?

Answer: You can have the following components as an Impala client, which can query or administer the Impala environment. 

  • Hue (Web Interface for querying)
  • Impala Shell
  • ODBC
  • JDBC

 

Question-22: Can Impala use the Hive Metastore?

Answer: Yes, Hive Metastore has the information about available data and let it know structure of the data, schema, table name, column names etc. 

 

Question-23: Can you please give me basic overview, how the queries are executed in case of Impala?

Answer: There is a process named Impala which runs on each DataNode on HDFS, which is responsible for executing and co-ordinating the queries. Each instance of the Impala can receive, plan & co-ordinate queries from Impala client. Queries would be distributed among Impala nodes, and these nodes then act as workers, execute queries in parallel. 

 

Question-24: What is Apache Kudu?

Answer: Apache Kudu is a columnar storage manager, developed for Hadoop platform. Kudu also shares the same common technical properties of Hadoop Ecosystem as below

  • Runs on commodity Hardware
  • Horizontally Scalable
  • Highly available operations

 

Question-25: Can you please tell me some benefits of the Apache Kudu?

Answer: Following are the few benefits of the Kudu

  • Fast processing of OLAP workloads
  • It can be easily integrated with the MapReduce, Spark, Flume & Other Hadoop Components.
  • Tight integration with Impala
  • Strong but flexible consistency e.g. consistency per request basis.
  • Highly performant for running sequential and random workloads simultaneously. 
  • Can be managed using Cloudera Manager
  • Structured Data Model
  • Highly available