Question-21: Which object can be used to get the progress of a particular job?
Answer: Context
Question-22: What is next step after Mapper or MapTask?
Answer: The output of the Mapper is sorted and then partitions will be created for the output. Number of partitions depends on the number of reducers.
Question-23: How can we control particular key should go in a specific reducer?
Answer: Users can control which keys or records go to which Reducer by implementing a custom Partitioner.
Question-24: What is the use of Combiner?
Answer: It is an optional component or class, and can be specify via
Job.setCombinerClass(ClassName)
to perform local aggregation of the intermediate outputs, which helps to cut down the amount of data transferred from the Mapper to the Reducer.
Question-25: How many maps are there in a particular Job?
Answer: The number of maps is usually driven by the total size of the inputs, that is, the total number of blocks of the input files. Generally, it is around 10-100 maps per-node. Task setup takes a while, so it is best if the maps take at least a minute to execute. Suppose, if you expect 10TB of input data and have a blocksize of 128MB, you'll end up with
82,000 maps, to control the number of block you can use the
mapreduce.job.maps
parameter (which only provides a hint to the framework). Ultimately, the number of tasks is controlled by the number of splits returned by the InputFormat.getSplits() method and that you can override.