Question-1: What is Hadoop framework?
Answer:Hadoop is an open source framework which is written in java by apache software foundation. This framework is used to write software application which requires to process vast amount of data and It could handle multi tera bytes of data. It works in-parallel on large clusters which could have 1000 of computers or you can say Nodes on the clusters. It also processes data very reliably and fault-tolerant manner.
Question-2: On What concept the Hadoop framework works?
Answer:It works on MapReduce, and it is devised by the Google.
Question-3: What is MapReduce?
Answer:Map reduce is an algorithm or concept to process Huge amount of data in a faster way. As per its name you can divide processing steps in two phase which are Map and Reduce.
The main MapReduce job usually splits the input data-set into independent chunks. You can say Huge data volume is splitted in the multiple small datasets. These are the below tasks which usually happens using MapReduce.
MapTask:will process these chunks in a completely parallel manner at a time one node can process one or more chunks or split.
Sorting: The framework sorts the outputs of the maps.
Reduce Task: Output generated in the above Map and Sorting steps will be the input for the reduce tasks, which would generate the final result as per you reduce logic.
You would be writing business logic in the MapTask and ReducTask. Typically, both the input and the output of the job are stored in a file-system sometime No-SQL database as well. It is frameworks responsibility for taking care of scheduling tasks, monitoring them and re-executes the failed tasks.
Question-4: What is compute and Storage nodes?
Answer:
Compute Node:This is the computer or machine where your actual business logic will be executed.
Storage Node:This is the computer or machine where your file system resides to store the processing data.
In most of the cases compute node and storage node would be on the same machine. This gives you advantages of data locality.
Question-5: How does master slave architecture in the Hadoop?
Answer:The MapReduce framework consists of a single master Job Tracker and multiple slaves, each cluster-node will have one Task Tracker.
The master is responsible for scheduling the jobs' component called tasks on the slaves, and then monitoring them and re-executing the failed tasks. The slaves execute the tasks as directed by the master.
 
											