analytics team wanted to run complex Hive queries for data analytics. However, they are not able to do this directly on the MySQL database. What they have to do, so that your analytics team can run Hive queries on
this data?
1. You will be copying this data to S3 bucket.
2. You will be using Sqoop to transfer this data on HDFS file system
3. You will be using OOZie to transfer this data on HDFS file system
4. You will be using Hive Metastore to transfer this data on HDFS file system
5. You will be using Hue to transfer this data on HDFS file system
Correct Answer : 2 Exp : In this question, we want that data stored in MySQL database can be queried using Apache Hive. For that we need this data to be in the HDFS file system. There is a utility available to
transfer MySQL data in Hive Warehouse directory on the HDFS. And once data in the Hive Warehouse directory, you can define new schema for this data and then query the same. Hence, option-2 is a correct option.
Option-1 : Copying data in S3 bucket can be a solution, if we have been using EMRFS. However, this is nowhere mentioned in the question. So we dont consider it as a correct answer.
Option-3 : Apache OOZie is a workflow solution for BigData. This is not for querying the data or transferring the data. However, you can define a workflow which uses the Sqoop tool to transfer data from MySQL to HDFS.
Still main job would be done by the Sqoop only.
Question-4 : Hive Metastore is an RDBMS repository to store various schema information for the data stored in HDFS , so that Hive can use it to understand data format and structure.
Option-5 : Hue is an UI for BigData Hadoop solution.
2
 
											