cluster and with continuously running jobs, you want to monitor the overall cluster and want to see graphical reports for the cluster. Which of the following is a correct option?
1. You can use OOZie for that
2. You can use Tableau for that
3. You can use Hive Display for that
4. You can use Ganglia for that
5. You can use GraphX of Spark for that
Correct Answer : 4 Exp : As we need a tool which can monitor the entire cluster, without impacting the performance of the cluster. Lets go through each option one by one
Option-1: OOzie is a workflow solution for creating or chaining the various BigData jobs which can include MapReduce jobs, Hive Query and/or Spark jobs etc. It is not a monitoring solution but a workflow solution.
Hence, this option is out.
Option-2: Tableau provides the dashboard based solution for the business analytics and cannot be used for the monitoring of the cluster. They have option for visualization, reporting etc. but for that you have to
provide data explicitly. This tool cannot collect monitoring data from the various nodes in the EMR cluster. Hence, this cannot be a correct solution.
Option-3: There is no tool named Hive Display. Hive itself is a tool which provides the feature to query data stored in the HDFS using SQL like query language. Again it is a data warehouse solution and not for
monitoring.
Option-4: The Ganglia open source project is a scalable, distributed system designed to monitor clusters and grids while minimizing the impact on their performance. When you enable Ganglia on your cluster, you can
generate reports and view the performance of the cluster as a whole, as well as inspect the performance of individual node instances. Ganglia is also configured to ingest and visualize Hadoop and Spark metrics. This
is the correct option.
Option-5: Spark GraphX is a processing engine for processing graph based data, which is generally difficult to store and process using RDBMS based engine and does not have cluster monitoring purpose. Hence, it cannot
be a correct option.
4