Question 55: You are having a huge volume of csv files in the S3 bucket and you are using AWS Athena

Question 55: You are having a huge volume of csv files in the S3 bucket and you are using AWS Athena to query this data. It is found that some queries are taking quite long time than expected. You need to do something

to improve the performance of those queries. However, make sure you are not using the resources which increase the cost for improving the query performance. Which of the following solution is ideal for this

requirement, assuming you run the queries on regular basis?

A. You will be creating unique key on the data

B. You will be creating primary key on the data

C. You will be splitting data as the individual query requires.

D. You can use Athena CTAS (Create table as select) statement.

E. You will be using proper partition keys

1. A,B

2. B,C

3. C,D

4. D,E

5. A,E

Correct Answer : 3 Exp : Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay

only for the queries that you run.

Athena is easy to use. Simply point to your data in Amazon S3, define the schema, and start querying using standard SQL. Most results are delivered within seconds. With Athena, theres no need for complex ETL jobs to

prepare your data for analysis. This makes it easy for anyone with SQL skills to quickly analyze large-scale datasets.

Athena is out-of-the-box integrated with AWS Glue Data Catalog, allowing you to create a unified metadata repository across various services, crawl data sources to discover schemas and populate your Catalog with new

and modified table and partition definitions, and maintain schema versioning. You can also use Glues fully-managed ETL capabilities to transform data or convert it into columnar formats to optimize cost and improve

performance.

The familiar CREATE TABLE statement creates an empty table. In contrast, the CTAS statement creates a new table containing the result of a SELECT query. The new tables metadata is automatically added to the AWS Glue

Data Catalog. The data files are stored in Amazon S3 at the designated location. When creating new tables using CTAS, you can include a WITH statement to define table-specific parameters, such as file format,

compression, and partition columns.

Hence, using the CTAS you will create smaller tables from the bigger tables and run the queries on the smaller tables.

Similarly you can split the bigger files in the smaller files in S3 bucket and then create smaller tables using that splitted data. Hence, your query performance can increase.

Details: Category: AWS Certified Big Data - Specialty; Last Updated: 30 November -0001

Related Articles