Question-31: Explain the Reducer’s reduce phase?

Answer: In this phase the reduce(MapOutKeyType, Iterable, Context) method is called for each pair in the grouped inputs. The output of the reduce task is typically written to the FileSystem via Context.write(ReduceOutKeyType, ReduceOutValType).

Applications can use the Context to report progress, set application-level status messages and update Counters, or just indicate that they are alive. The output of the Reducer is not sorted. 

Question-32: How many Reducers should be configured?

Answer: The right number of reduces seems to be 0.95 or 1.75 multiplied by (<no. of nodes> * mapreduce.tasktracker.reduce.tasks.maximum). With 0.95 all of the reduces can launch immediately and start transferring map outputs as the maps finish. With 1.75 the faster nodes will finish their first round of reduces and launch a second wave of reduces doing a much better job of load balancing. Increasing the number of reducers, increases the framework overhead, but increases load balancing and lowers the cost of failures.

Question-33: It can be possible that a Job has 0 reducers?

Answer: It is legal to set the number of reduce-tasks to zero if no reduction is desired. 

Question-34: What happens if number of reducers are 0?

Answer: In this case the outputs of the map-tasks go directly to the FileSystem path, which is set using setOutputPath(Path). The framework does not sort the map-outputs before writing them out to the FileSystem. 

Question-35: How many instances of JobTracker can run on a Hadoop Cluster?

Answer: Only one