Question 35: You are having 1TB of data in an S3 bucket, which needs to be uploaded in a Redshift table. Entire load can take upto 3 Hours, and there are possibility that load can fails because of some data issue, you

have to fix that data issue and start the load again, until it is get loaded successfully your team keep doing this activity. However, you have been asked to optimize this process what all option you can choose from

below?

A. You will be splitting this 1TB file in 1000 equal parts.

B. You will be further compressing this 1TB file.

C. You will be transforming this data into Parquet File format using AWS Glue and defined schema.

D. You will be using COPY command to load the data

E. You will be using COPY command with the NOLOAD option to load the data

F. You will be using PARALLEL LOAD with the option NOERRORS to load the data.

1. A,B

2. C,D

3. D,E

4. E,F

5. A,E

Correct Answer : 5 Exp : To upload huge volume of data in parallel we can split it into equal parts as per the option-A. So that data can be loaded in parallel.

Next point here is that data may have some issues, which can cause the entire load got failed and you have to restart the load after fixing issue. Which is not good, instead before loading the data you should be able

to find whether there is any issue with the data. And for that you can use NOLOAD option with the COPY command. In this case data would be validated first and if there is any error, it will be displayed. And you can

fix that accordingly without wasting any time. Hence, option E is correct.

Similarly from AWS Documentation: NOLOAD

Checks the validity of the data file without actually loading the data. Use the NOLOAD parameter to make sure that your data file will load without any errors before running the actual data load. Running COPY with the

NOLOAD parameter is much faster than loading the data because it only parses the files.

5