Question 53: You are working in an e-commerce company which is getting its log data stored in S3 bucket. However, your analytics team already running MapReduce Job on this data using EMR cluster. And analytics team on

daily basis creating new logic and models and for that they have to keep creating new MapReduce job. However, they are using fixed fields from the logs to analyze the data. Which of the below is an ideal solution, so

that data stored in S3 can be efficiently queried?

A. You will be using AWS Lambda service

B. You will be using Apache Hive

C. You will be using AWS Athena

D. You will be using SparkSQL

E. You will be using Zeppelin notebooks

1. A,B

2. B,C

3. A,C

4. D,E

5. A,E

Correct Answer : 3 Exp : In this question it is clearly mentioned that data stored in S3 and currently EMR is used. Which runs the MapReduce jobs on this data. Your analytics team is using fixed fields from these log

files. So what you can do, you will be extracting fixed fields from the log files using the Lambda function and create a csv file using those data and save in the S3 only, you can create separate bucket for that. And

then you can use AWS Athena Service to run the SQL queries on this data. Hence, option-A and C are correct.

Apache Hive: You have to define the schema and then you can use this for querying the data. However, no option is provided, how will you create schema automatically in Hive Metastore. So we dont go for this option

and always give more priority to AWS services if it is available.

SparkSQL again needs data in structured format and then you can query it, Using SparkSQL, we prefer serverless architecture rather than using SparkSQL which required execution EMR setup.

Zeppelin notebooks are for interactive analytics and not for the running batch queries. Hence, we cannot consider this option.

3