Question-66: How Impala query and Hive Query are related?
Answer: There are some minor differences between Impala query and Hive Query. However, Impala queries can be executed in the Hive, because Impala SQL is a subset of HiveQL.
Question-67: How does it affect Impala Query if data is already loaded in the HBase or Hive?
Answer: It does not matter, the only requirement is that Impala should be able to access Hive metastore. Keep in mind that impalad, by default, runs as the Impala user, so you might need to adjust some file permissions depending on how strict your permission are currently.
Question-68: Is Hive required to run the Impala?
Answer: Hive metastore is required by the Impala, because Impala shares the same metastore database as Hive, allowing Hive and Impala to access the same tables transparently. Hive itself is optional and does not need to be installed on the same nodes as Impala. As Impala has more variety of read query instead of write. And Hive provides more option to insert data in the table.
Question-69: Can Impala able to query the table which has trillions of rows?
Answer: Yes, many of the Cloudera Customer achieved this.
Question-70: Can I configure Impala for High Availability?
Answer: Yes, you need to set up a proxy server to relay requests back and forth to the Impala servers, for load balancing and high availability. For Hive metastore enable the HDFS HA.