Question 36: You are having 1TB of data in on-premises data center, you want to upload this data in

Question 36: You are having 1TB of data in on-premises data center, you want to upload this data in Redshift Cluster so that analytics team can run query on this data. Hence, you have initially copy this data to an S3

bucket, and from there it will be copied to Redshift cluster. Which of the following option is correct and most efficient to load data in Redshift Table?

1. You will compress this data and then using the PARALLEL UPLOAD command you will load data in Redshift table.

2. You will split this data in 1000 equal parts and then using the PARALLEL UPLOAD command you will load data in Redshift table.

3. Using AWS Glue, you will convert this data in Parquet file and then using the PARALLEL UPLOAD command you will load data in Redshift table.

4. You will split this data in 1000 equal parts and use the COPY command to load this data in a Redshift table

5. You will compress this data and then using the COPY command you will load data in Redshift table

Correct Answer : 4 Exp : You can load table data from a single file, or you can split the data for each table into multiple files. The COPY command can load data from multiple files in parallel. You can load multiple

files by specifying a common prefix, or prefix key, for the set, or by explicitly listing the files in a manifest file.

We strongly recommend that you divide your data into multiple files to take advantage of parallel processing. Split your data into files so that the number of files is a multiple of the number of slices in your

cluster. That way Amazon Redshift can divide the data evenly among the slices. The number of slices per node depends on the node size of the cluster.

The nodes all participate in parallel query execution, working on data that is distributed as evenly as possible across the slices. If you have a cluster with two DS1.XL nodes, you might split your data into four

files or some multiple of four. Amazon Redshift does not take file size into account when dividing the workload, so you need to ensure that the files are roughly the same size, between 1 MB and 1 GB after compression.

You may be given another option like split file in 200 equal parts and then use the COPY command to upload data in parallel. And you may confuse, which option is correct. We recommend you choose option with the higher

split count.

Details: Category: AWS Certified Big Data - Specialty; Last Updated: 30 November -0001

Related Articles