Question 73: You are having huge volume of historical data in a csv file format almost 2TB of data.

Question 73: You are having huge volume of historical data in a csv file format almost 2TB of data. You have uploaded all this data in an S3 bucket and some of the basic analytics was done on this data. However, your

business team is very much want to have this data loaded in one of the table in Redshift cluster. What are the things you will keep in mind so that this data can be copied in the Redshift cluster in minimum time and

efficiently?

A. You will be compressing the csv file for faster loading on the cluster.

B. You will be creating single csv file, by appending all the csv files together.

C. You will be creating CSV file in multiple of number of slice per node.

D. You will make sure that, you are using larger instance type.

E. You have to make sure that all the csv files are encrypted.

1. A,B

2. B,C

3. C,D

4. D,E

5. A,E

Correct Answer : 3 Exp : As question is not asking that you Redshift cluster should be encrypted. Hence, we dont need data to encrypted first in S3 bucket. We can ignore option-E

To load the data in Redshift cluster, it is not a requirement to append all the csv files in single file. It is even a problem. Rather it should be created in multiple batches. Hence, option-A and option-B cannot be

correct.

You can load table data from a single file, or you can split the data for each table into multiple files. The COPY command can load data from multiple files in parallel. You can load multiple files by specifying a

common prefix, or prefix key, for the set, or by explicitly listing the files in a manifest file.

We strongly recommend that you divide your data into multiple files to take advantage of parallel processing.

Split your data into files so that the number of files is a multiple of the number of slices in your cluster. That way Amazon Redshift can divide the data evenly among the slices. The number of slices per node depends

on the node size of the cluster.

Amazon Redshift does not take file size into account when dividing the workload, so you need to ensure that the files are roughly the same size, between 1 MB and 1 GB after compression.

If you intend to use object prefixes to identify the load files, name each file with a common prefix. For example, the he_load.txt file might be split into four files, as follows:

he_load.txt.1

he_load.txt.2

he_load.txt.3

he_load.txt.4

If you put multiple files in a folder in your bucket, you can specify the folder name as the prefix and COPY will load all of the files in the folder. If you explicitly list the files to be loaded by using a manifest

file, the files can reside in different buckets or folders.

Details: Category: AWS Certified Big Data - Specialty; Last Updated: 30 November -0001

Related Articles