Question-6: What all are the Hadoop application basic components?
Answer:Minimally a Hadoop application would have following components.
- Input location of data
- Output location of processed data.
- A map tasks.
- A reduced task.
- Job configuration
The Hadoop job client then submits the job this would be an executable jar and configuration to the Job-Tracker which then assumes the responsibility of distributing the software jar and configuration to the slave nodes, then scheduling those tasks and monitoring them, which regularly provides the status and diagnostic information to the client, who submitted this job.
Question-7: Explain how input and output data format of the Hadoop framework?
Answer:The MapReduce framework operates exclusively on pairs. The framework views the input to the job as a set of pairs and produces a set of pairs as the output of the job. This is the flow
(input) -> map -> combine/sorting -> reduce -> (output)
Question-8: What is the restriction to the key and value class for MapReduce job?
Answer:The key and value classes have to be serialized by the framework. To make them serializable Hadoop provides a Writable interface. You might be knowing from the java itself that the key of the Map should be comparable, hence the key has to implement one more interface WritableComparable. Which makes it serializable as well as comparable. Having comparable key would help Hadoop to sort the data based on the Key. And having key as a Writable make it serializable so that this can be transferred across node over the network.
Question-9: Explain the WordCount implementation via Hadoop framework?
Answer: We will count the words in all the input file flow as below
- input:Assume there are two files each having a sentence
- File-1:Hello World Hello World
- File-2:Hello World Hello World
- Mapper:There would be each mapper for a file
For the given sample input the first map output:
< Hello, 1>
< World, 1>
< Hello, 1>
< World, 1>
The second map output:
< Hello, 1>
< World, 1>
< Hello, 1>
< World, 1>
Now Combiner and Sorting would be done for each individual map.
And output as below would be generated.
The output of the first map:
< Hello, 2>
< World, 2>
The output of the second map:
< Hello, 2>
< World, 2>
- Reducer Output: It sums up the above output and generates the output as below
< Hello, 4>
< World, 4>
Final output would look like
- Hello 4 times
- World 4 times
Question-10: Which interface needs to be implemented to create Mapper and Reducer for the Hadoop?
Answer:Below are the respective interfaces you have to implement for Map and Reduce tasks.
org.apache.hadoop.mapreduce.Mapper
org.apache.hadoop.mapreduce.Reducer