Question 75: You are working with an e-commerce company which parse its log data and store it in the Redshift cluster tables. Analytics team apply various analytics on these tables. And on daily basis you are adding

more data to the tables and running the VACUUM command as well. However, still there seems to be some performance issues on the table. And you started executing the query plan on the table. While evaluating the query

plan, you find couple of issues regarding the distribution styles, which is used by the tables. Which of the below two you will check?

A. DS_DIST_ALL_NONE

B. DS_DIST_OUTER

C. DS_BCAST_INNER

D. DS_DIST_INNER

E. DS_DIST_ALL_INNER

1. A,B

2. B,C

3. C,D

4. D,E

5. A,E

Correct Answer : 4 Exp : As per the AWS Documentation you must go through below things while checking the query plan for Redshift table

You can use query plans to identify candidates for optimizing the distribution style. After making your initial design decisions, create your tables, load them with data, and test them. Use a test data set that is as

close as possible to the real data. Measure load times to use as a baseline for comparisons.

Evaluate queries that are representative of the most costly queries you expect to execute; specifically, queries that use joins and aggregations. Compare execution times for various design options. When you compare

execution times, do not count the first time the query is executed, because the first run time includes the compilation time.

DS_DIST_NONE : No redistribution is required, because corresponding slices are collocated on the compute nodes. You will typically have only one DS_DIST_NONE step, the join between the fact table and one dimension

table.

DS_DIST_ALL_NONE : No redistribution is required, because the inner join table used DISTSTYLE ALL. The entire table is located on every node.

DS_DIST_INNER : The inner table is redistributed.

DS_DIST_OUTER : The outer table is redistributed.

DS_BCAST_INNER : A copy of the entire inner table is broadcast to all the compute nodes.

DS_DIST_ALL_INNER : The entire inner table is redistributed to a single slice because the outer table uses DISTSTYLE ALL.

DS_DIST_BOTH : Both tables are redistributed.

DS_DIST_NONE and DS_DIST_ALL_NONE are good. They indicate that no distribution was required for that step because all of the joins are collocated.

DS_DIST_INNER means that the step will probably have a relatively high cost because the inner table is being redistributed to the nodes. DS_DIST_INNER indicates that the outer table is already properly distributed on

the join key. Set the inner tables distribution key to the join key to convert this to DS_DIST_NONE. If distributing the inner table on the join key is not possible because the outer table is not distributed on the

join key, evaluate whether to use ALL distribution for the inner table. If the table is relatively slow moving, that is, it is not updated frequently or extensively, and it is large enough to carry a high

redistribution cost, change the distribution style to ALL and test again. ALL distribution causes increased load times, so when you retest, include the load time in your evaluation factors.

DS_DIST_ALL_INNER is not good. It means the entire inner table is redistributed to a single slice because the outer table uses DISTSTYLE ALL, so that a copy of the entire outer table is located on each node. This

results in inefficient serial execution of the join on a single node instead taking advantage of parallel execution using all of the nodes. DISTSTYLE ALL is meant to be used only for the inner join table. Instead,

specify a distribution key or use even distribution for the outer table.

DS_BCAST_INNER and DS_DIST_BOTH are not good. Usually these redistributions occur because the tables are not joined on their distribution keys. If the fact table does not already have a distribution key, specify the

joining column as the distribution key for both tables. If the fact table already has a distribution key on another column, you should evaluate whether changing the distribution key to collocate this join will improve

overall performance. If changing the distribution key of the outer table is not an optimal choice, you can achieve collocation by specifying DISTSTYLE ALL for the inner table.

4