Question-96: What do you mean by rack awareness in Hadoop cluster?
Answer: Rack Awareness means Hadoop cluster is well aware about that on which Rack in the Datacenter your node resides. A single Hadoop cluster can be span across the more than one Datacenter and one datacenter can have many racks. So by knowing in advance on which Rack and Datacenter the node resides, it helps in improving the performance a lot.
Question-97: What all are the advantages of having Rack Awareness in the Hadoop eco-system?
Answer: Rack awareness helps in knowing the location of the node in the cluster, in advance. Which increases availability of data blocks and improve cluster performance. Co-locating data replication blocks on one physical rack speed the replication process. Similarly in the HDFS balancer and DataNode decommissioning are the Rack Aware processes. However, keep in mind that Rack Awareness is not available by default.
Question-98: How can we set the Rack Awareness in the cluster?
Answer: When we create the cluster at the first Rack Awareness is not set, we need to do it explicitly and that can be done by following two ways.
- Set the RackId using Ambari : In this case Ambari is responsible passing the Rack information to HDFS using the topology scripts. HDFS uses this topology script to get the Rack Information about the DataNode.
- Set the RackID using the custom topology script : You would be using this one when you don’t want Ambari to manage the Rack information. And for that you have to provide your own topology scripts and manage the distribution of script to all the hosts.
Question-99: I know Spark Service has its own native wen UI like History Server, Job UI etc. How we can get all this in Ambari UI?
Answer: Some services have their own native user interface like Spark History server etc. For that Ambari provides the links to that native UI as well and that you can get it under the Quick Links of that service.
Question-100: What do you mean by Rolling Start?
Answer: Rolling start can be used to start the multiple Components together. However, their dependency is maintained by the Ambari itself. If one component is depend on another component then rolling start make sure that fist component should be started and then any component on that.
So whenever we need to start the multiple components we need to use the Rolling Start which also distribute the tasks of starting. A rolling start first stops and then start multiple components in parallel as well If there are no dependency for example in case of HDFS, DataNodes, NodeManagers, RegionServers and Supervisor would be started using the batch sequence.
Default setting is like that for a rolling start of a component in a three node cluster restarts one component at a time, waits for 2 minutes between restarts. This wait time period can be changed by setting the values in Ambari UI.