Question-61: Impala does not cache the data; then why subsequent run of the query is faster?

Answer: Impala does not cache the data but subsequent run of the query is faster because the data set was cached in the OS buffer cache, Impala does not control this explicitly.

Even, Impala takes advantages of the HDFS caching feature in CDH.  Like while creating table we can designate which tables or partitions are cached explicitly through CACHED and UNCACHED clause. 

HDFS Cache: Impala can also take advantage of data that is pinned in the HDFS cache through the hdfscacheadmin command. 

 

Question-62:  Where do you prefer Impala instead of Hive or MapReduce?

Answer: Impala is well suited for executing SQL queries for interactive exploratory analytics on large datasets. Hive and MapReduce are appropriate for very long running, batch-oriented tasks such as ETL. 

 

Question-63: Impala Uses the MapReduce as an underline processing engine?

Answer: No, Impala does not use the MapReduce. Even you stop MapReduce service Impala would work fine. 

 

Question-64: Can we use the Impala for Stream Processing?

Answer: Stream-processing or Complex Event processing is not well suited for the Impala. Because it is most closely resembling a relational database. 

 

Question-65: How you compare Impala with Hive and Pig?

Answer: Impala is different than Pig and Hive, because it uses its own daemons that are spread across the cluster for queries. Impala does not use the MapReduce where Pig and Hive do, and not using the MapReduce avoid the startup overhead and allowing Impala to return the results in real-time.