increase in future. To improve the time you need more nodes, like if you increase your nodes count to 20, then overall job time can be reduced to 2 hours and you are fine with that. Hence, you decided to use AWS EMR
where you can increase the number of nodes as per your need. You have written all the MapReduce application using Python programming language. Which of the following would help in running your Python based MapReduce
job on the EMR cluster?
1. You will be using streaming step of the EMR cluster to run the MapReduce jobs.
2. You will be converting your Mapper and Reducer to Lambda function and then you can execute this on the EMR cluster.
3. You need to convert this Python based MapReduce job to SQL queries.
4. You need to convert this Python based MapReduce into Java based MapReduce application.
5. You have to create OOzie workflow, instead of running Python based MapReduce application.
Correct Answer : 1 Exp : Hadoop support application which is written using other than Java programming as well. Hence, while migrating from In-house Hadoop cluster to EMR will not impact your existing code and you can
use it as it is. An EMR Streaming application reads input from standard input and then runs a script or executable (called a mapper) against each input. The result from each of the inputs is saved locally, typically
on a Hadoop Distributed File System (HDFS) partition. After all the input is processed by the mapper, a second script or executable (called a reducer) processes the mapper results. The results from the reducer are
sent to standard output. You can chain together a series of Streaming steps, where the output of one step becomes the input of another step.
The mapper and the reducer can each be referenced as a file or you can supply a Java class. You can implement the mapper and reducer in any of the supported languages, including Ruby, Perl, Python, PHP, or Bash. Based
on that we can say that option -1 is correct.
1