Question 7: You are having big farm of EC2 server instances on which web servers are installed for an e-commerce web application named www.buykart.com . All the web servers are generating application logs as well as

other logs and now requirement is that these log files will be used for applying machine learning for target based advertising and for that these logs are ingested to an EMR cluster. Which of the following component

is suitable from below to apply the machine learning on log files and can run on the EMR cluster?

1. Hive

2. Presto

3. OOzie

4. Tez

5. Spark

Correct Answer : 5 Exp : In the question it is asking which of the below component can be used for applying machine learning algorithm on the logs data stored in EMR cluster.

Hive: It is a data ware house solution, which can provide schema to your log data and also you can apply the SQL like query on it. But you cannot use Hive for the Machine Learning algorithm

Presto: This is also query engine and not a Machine Learning solution.

OOzie: Its a workflow engine, you can use it to schedule your workflow, which runs some Machine Learning algorithm, but this itself is not a Machine Learning solution.

Tez: It is an engine to run jobs like MapReduce or others. And it is not a solution for Machine Learning algorithm. However, you can use it for running the Machine Learning Algorithm.

Spark : It is a collection of many framework library written on the core Spark Library. One of is the MLib, which you can use it to run the Machine Learning Algorithm. Hence, it can be a correct option.

5