Question-106: What do you mean by HDFS Federation?
Answer: Using the HDFS federation you can scale a cluster horizontally by configuring multiple namespace and Namenodes. And DataNodes in the cluster are available as a common block of storage for every NameNode in the federation.
Question-107: What is Java CMS (Concurrent Mark and Sweep ) and how it affects the HDFS?
Answer: You can get access to Java HandOn Programming training here, it is always good to know about the basics of Java (because of its ubiquitous nature and many administration components are build using it).
Java has its own garbage collection mechanism and there are various algorithm as well for this. One of the popular which one also used by the HDFS is CMS (Concurrent Mark and Sweep) garbage collection (GC) algorithm and includes the various heuristic (find something itself) used to trigger garbage collection and makes the garbage collection less predictable and tends to delay many GC operations and wait for capacity to be reached. And which can lead to Full GC run (and pause the application during that time, not at all good). Ambari sets default parameter values for many properties during cluster deployment and below are the two parameters which can affect the CMS GC process
- UseCMSInitiatingOccupancyOnly : Which prevents the use of GC heuristics.
- CMSInitiatingOccupancyFraction: You can define this value in percent, when to run the garbage collection and not wait for full capacity to be reached. This parameter has default value to be 50.
Question-108: Can I change the home directory for each user in HDFS?
Answer: By default each user home directory is created under the /user/<User or Service Name>, if you wanted to change it to somewhere else then update or set the “dfs.user.home.base.dir” value to desired value.
Question-109: What all things we should keep in mind when we are planning for the HDFS federation?
Answer: Using HDFS federation we are allowing cluster to scale horizontally by configuring the multiple namespace and NameNode. The DataNodes in the cluster are available as a common block of storage for every NameNode in the federation (this is explained in our Hadoop Admin and Professional training as well).
So, whenever you want to configure HDFS federation make sure
- This should be done only during the maintenance window, because all the cluster service would be restarted during the HDFS federation configuration.
- You must have HA configured for the NameNode which you want to include in the federation.
Question-110: What is the limit for NameNode and Namespace for the HDFS federation?
Answer: You must associate every NameNode you wanted to included in a federation with a namespace. We can configure at the max four namespaces in a federated environment.