www.HadoopExam.com

HadoopExam Learning Resources

Question 18: The web analytics team uses Hadoop to process access logs. They now want to correlate this
data with structured user data residing in a production single-instance JDBC database. They collaborate with the production team to import the data into Hadoop. Which tool should they use?

1.  Chukwa
2.  Sqoop
3.  Pig
4.  Flume

Correct Answer : 2 Exp: Sqoop is a tool designed to transfer data between Hadoop and relational databases. You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS.

Sqoop automates most of this process, relying on the database to describe the schema for the data to be imported. Sqoop uses MapReduce to import and export the data, which provides parallel operation as well as fault tolerance.

This document describes how to get started using Sqoop to move data between databases and Hadoop and provides reference information for the operation of the Sqoop command-line tool suite. This document is intended for:

System and application programmers
System administrators
Database administrators
Data analysts
Data engineers

 

 

 

You have no rights to post comments

You are here: Home EMC Certification EMC Data Science EMC Data Science Question 18: The web analytics team uses Hadoop to process access logs. They now want to correlate this data with structured user data residing in a production single-instance JDBC database. They collaborate with the production team to im