Question 13: You have an in-house 5 node Hadoop cluster based on Open Source Hadoop. Your data volum

Question 13: You have an in-house 5 node Hadoop cluster based on Open Source Hadoop. Your data volume is increasing day by day. Daily run of the Hadoop job is taking more than 10 hours, which is quite high and can

increase in future. To improve the time you need more nodes, like if you increase your nodes count to 20, then overall job time can be reduced to 2 hours and you are fine with that. Hence, you decided to use AWS EMR

where you can increase the number of nodes as per your need. You have written all the MapReduce application using Python programming language. Which of the following would help in running your Python based MapReduce

job on the EMR cluster?

1. You will be using streaming step of the EMR cluster to run the MapReduce jobs.

2. You will be converting your Mapper and Reducer to Lambda function and then you can execute this on the EMR cluster.

3. You need to convert this Python based MapReduce job to SQL queries.

4. You need to convert this Python based MapReduce into Java based MapReduce application.

5. You have to create OOzie workflow, instead of running Python based MapReduce application.

Correct Answer : 1 Exp : Hadoop support application which is written using other than Java programming as well. Hence, while migrating from In-house Hadoop cluster to EMR will not impact your existing code and you can

use it as it is. An EMR Streaming application reads input from standard input and then runs a script or executable (called a mapper) against each input. The result from each of the inputs is saved locally, typically

on a Hadoop Distributed File System (HDFS) partition. After all the input is processed by the mapper, a second script or executable (called a reducer) processes the mapper results. The results from the reducer are

sent to standard output. You can chain together a series of Streaming steps, where the output of one step becomes the input of another step.

The mapper and the reducer can each be referenced as a file or you can supply a Java class. You can implement the mapper and reducer in any of the supported languages, including Ruby, Perl, Python, PHP, or Bash. Based

on that we can say that option -1 is correct.

Details: Category: AWS Certified Big Data - Specialty; Last Updated: 30 November -0001

Related Articles