Question 6: You are working with an e-commerce company which has millions of products sale on daily basis and wanted to increase the sale and for that they will be using product recommendation approach based on the

previous product purchase or product searched by the user. To implement this it is required that real time web click streaming data provided in the EMR cluster and SQL queries can be executed on real time data. You

already have a solution which can stream this real time data using the Kinesis Data Stream. Which of the following is an ideal solution for running the SQL queries on the real-time data?

1. You will be using Spark on the EMR which will consume the real time data from the Kinesis Data Stream.

2. You will be using Kinesis Producer Library and Kinesis Client Library and using the client library you can run the SQL queries.

3. You will be using EMR and Hive.

4. You will be using EMR and Sqoop

Correct Answer : 1 Exp : As we need to run the SQL queries on the streaming data, we can use the Spark Streaming solution. Using the Spark Streaming you can buffer real time data in batches and run the SQL query on

it. And to run the Spark jobs stream jobs you can use AWS EMR. Hence, option-1 is correct.

1