Hadoop Administrator (Cloudera) Interview Questions-17

Question-81: What happens to query which has more data size then available memory?

Answer: As of now, if the memory required to process intermediate result on a node exceed the available memory to impala process then the query would be cancelled. We can even adjust the available memory on each individual node and fine tune the strategy. Because currently external join and sorting is not supported for Impala.

Question-82: Why do I see higher memory usage by Impala, even query is not running?

Answer: Impala allocates memory and once allocated it keeps this memory reserved for future use, the name of the memory allocator is tcmalloc (optimized for high concurrency). Hence, if you are a programmer and using JDBC/ODBC than call appropriate close method afterwards. Otherwise, some memory associated with the query will not be freed.

Question-83: Impala supports UDF?

Answer: Yes, you can use UDFs and UDA and need to be written using C++, and existing UDFs from the Hive can also be used.

Question-84: When a managed table is dropped, still disk space is not freed, why?

Answer: When you drop a managed table, it moves the data in another trash directory, hence disk space is not free for some configured like 6 hrs.

Question-85: What kind of data best fit for HBase database?

Answer: HBase are good where you need to store key-value data and query needs to fetch few rows from the table, using = or IN operator.

And you should avoid HBase if your query needs to fetch rows more that few thousands. Full table scan using where clause is worst for the HBase table. Often HBase tables are wide and sparse many of the values in a row may be Null.

Details: Category: Hadoop Administrator; Last Updated: 24 April 2021

Related Articles

Hadoop Administrator (Cloudera) Interview Questions-17