Question-26: What kind of applications where Kudu best fit?

Answer: There are following things which are difficult to implement on currently available Hadoop Technologies, but Kudu can help

  • Reporting application: Where new data must be immediately available for end users.
  • Time-series applications: Querying large amount of historic data as well as granular queries on individual entity.
  • Predictive Models: Application which uses the predictive models for making real-time decisions, with the periodic refreshes of the predictive models based on historical data. 

 

Question-27: What is Apache Sentry?

Answer: Apache Sentry is a granular, role-based authorization module for Hadoop. It is used as a plugin for authorization engine for Hadoop components. Using this we can define authorization rules to validate a user or application’s access requests for Hadoop resources. 

 

Question-28: On Cloudera CDH6, which all are cluster manager supported for Apache Spark?

Answer: Since CDH6 Spark Standalone Cluster Manager is no more supported, you have to use YARN as a cluster manager. On CDH6, Spark 1.6 is also not supported. 

 

Question-29: What is the use of Cloudera Manager component?

Answer: Cloudera Manager is an end-to-end application for managing CDH clusters. With this we can easily deploy and centrally operate the complete CDH stack and other managed services. 

 

Question-30: Can you please explain how Cloudera Manager, hosts and Cluster are associated?

Answer: Cloudera Manager is a logical entity that contains a set of hosts. 

  • On the hosts only single version of CDH is installed e.g. either CDH-5 or CDH-6
  • Services and Role instances run on the Hosts e.g. HDFS instance
  • A single host can belong to only one cluster.
  • Cloudera Manager can manage more than one cluster.
  • A single cluster can only be associated with the single Cloudera Manager