Question 2: You are working with an e-commerce company which parse its log data and store it in the

Question 2: You are working with an e-commerce company which parse its log data and store it in the Redshift cluster table. Parsed data has columns like productId, IPaddress, username, orderValue, timestamp,

productname and orderId. Whenever any analytics is applied it always prefer the most recent order only for the analysis. On the daily basis new data is added to the same table but your analytics team is complaining

the performance of the table, specifying that it is de-grading day by day. How can you improve the performance of this table?

1. You will run the resort command of the table.

2. You will ask analytics team to re-generate the query plan.

3. You will create an index on the table.

4. You will be using the VACUUM command on the table.

5. You will OPTIMIZE command on the table

Correct Answer : 4 Exp : As per the AWS Redshift documentation

To clean up tables after a load or a series of incremental updates, you need to run the VACUUM command, either against the entire database or against individual tables.

Only the table owner or a superuser can effectively vacuum a table. If you do not have owner or superuser privileges for a table, a VACUUM that specifies a single table will fail. If you run a VACUUM of the entire

database, without specifying a table name, the operation completes successfully but has no effect on tables for which you do not have owner or superuser privileges. For this reason, and because vacuuming the entire

database is potentially an expensive operation, we recommend vacuuming individual tables as needed.

When you perform a delete, the rows are marked for deletion, but not removed. Amazon Redshift automatically runs a VACUUM DELETE operation in the background based on the number of deleted rows in database tables.

Amazon Redshift schedules the VACUUM DELETE to run during periods of reduced load and pauses the operation during periods of high load.

For tables with a sort key, the VACUUM command ensures that new data in tables is fully sorted on disk. When data is initially loaded into a table that has a sort key, the data is sorted according to the SORTKEY

specification in the CREATE TABLE statement. However, when you update the table, using COPY, INSERT, or UPDATE statements, new rows are stored in a separate unsorted region on disk, then sorted on demand for queries

as required. If large numbers of rows remain unsorted on disk, query performance might be degraded for operations that rely on sorted data, such as range-restricted scans or merge joins. The VACUUM command merges new

rows with existing sorted rows, so range-restricted scans are more efficient and the execution engine doesnt need to sort rows on demand during query execution.

Details: Category: AWS Certified Big Data - Specialty; Last Updated: 30 November -0001

Related Articles