Question 64: You are having some e-commerce data stored in S3 bucket which is in the comma separated format as well as in XML and JSON format. You are using AWS Athena to create table using this data and run the SQL

query on this data for various data analysis. Which of the following can help you create separate tables with each format of the data?

1. You will be using data format clause while creating the data.

2. AWS Athena will automatically find the data format and create table accordingly.

3. You will be using SerDe (Sterilizer and De-serilaizer) .

4. Athena supports only CSV format

Correct Answer : 3 Exp : A SerDe (Serializer/Deserializer) is a way in which Athena interacts with data in various formats.

It is the SerDe you specify, and not the DDL, that defines the table schema. In other words, the SerDe can override the DDL configuration that you specify in Athena when you create your table.

To Use a SerDe in Queries

To use a SerDe when creating a table in Athena, use one of the following methods:

Use DDL statements to describe how to read and write data to the table and do not specify a ROW FORMAT, as in this example. This omits listing the actual SerDe type and the native LazySimpleSerDe is used by default.

In general, Athena uses the LazySimpleSerDe if you do not specify a ROW FORMAT, or if you specify ROW FORMAT DELIMITED.

ROW FORMAT

DELIMITED FIELDS TERMINATED BY ,

ESCAPED BY \

COLLECTION ITEMS TERMINATED BY |

MAP KEYS TERMINATED BY :

Explicitly specify the type of SerDe Athena should use when it reads and writes data to the table. Also, specify additional properties in SERDEPROPERTIES, as in this example.

ROW FORMAT SERDE org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

WITH SERDEPROPERTIES (

serialization.format = ,,

field.delim = ,,

collection.delim = |,

mapkey.delim = :,

escape.delim = \

)

3