Question-56: Is it required by each Worker host to access repository to install the software?
Answer: No, If you are using the parcels, in this case on Cloudera Manager Server requires access to the Cloudera Public repositories. Distribution of the parcels to worker hosts is done between the Cloudera Manager Server and the worker hosts. If you are using traditional packages then host only requires access to the installation files.
Question-57: Which feature is used, so that text SQL queries are not visible in the logs?
Answer: You can use the log redaction feature to obfuscate sensitive information in the impala log files.
Question-58: Is it required to install Impala on all the nodes in the cluster?
Answer: Yes, it is important to install Impala on all the DataNodes in the cluster. Because otherwise some of the nodes must do remote reads to retrieve data not available locally. Because data locality is an important aspect of the Impala performance. As the number of nodes increases Impala performance also increases.
Question-59: During the query processing by Impala to improve the query performance HDFS block size is reduced?
Answer: No, Impala does not change the block size of HDFS and not even it changes HBase dataset size.
Question-60: Is Impala uses the caching for faster query result?
Answer: No, Impala explicitly does not cache the data. It caches some of the metadata for the files and tables.