Question-26. What is the Reducer used for?

Answer: Reducer reduces a set of intermediate values which share a key to a (usually smaller) set of values. 

The    number    of    reduces    for    the    job    is    set    by    the    user    via 

Job.setNumReduceTasks(int). 

Question-27: Explain the core methods of the Reducer?

Answer: The API of Reducer is very similar to the Mapper, there is a run() method that receives a Context containing the job's configuration as well as interfacing methods that return data from the reducer itself back to the framework. The run() method calls setup() once, reduce()  once for each key associated with the reduce task, and cleanup() once at the end. Each of these methods can access the job's configuration data by using Context.getConfiguration().

As in Mapper, any or all of these methods can be overridden with custom implementations. If none of these methods are overridden, the default reducer operation is the identity function. Values are pass through without further processing.

The heart of Reducer is its reduce() method. This is called once per key. The second argument is an Iterable which returns all the values associated with that key. 

Question-28: What are the primary phases of the Reducer?

Answer: Shuffle, Sort and Reduce 

Question-29: Explain the shuffle phase?

Answer: Input to the Reducer is the sorted output of the mappers. In this phase the framework fetches the relevant partition of the output of all the mappers, via HTTP. 

Question-30: Explain the Reducer’s Sort phase? 

Answer: The framework groups Reducer inputs by keys (since different mappers may have output the same key) in this stage. The shuffle and sort phases occur simultaneously, while map-outputs are being fetched they are merged. (It is similar to merge-sort).