Question 66: You are having some data in the various format for e-commerce application. This data is saved in your corporate data center. You want to apply Machine Learning Algorithm on these data using AWS Machine

Learning Service. What all you have to do for this?

A. You have to convert all the data in XML format.

B. You have to convert all the data in CSV format.

C. You have to convert all the data in JSON format.

D. You will be uploading this all data in an S3 bucket.

E. You have to make sure all the data would have same schema.

F. You will be uploading this all data in an EBS volume.

1. A,B,C

2. C,D,E

3. D,E,F

4. A,B,E

5. B,D,F

Correct Answer : 5 Exp : As you want to use the AWS Machine Learning to process your data. You have to follow some conventions like as below.

Whatever data you are using as input for machine learning, it should be in .csv format. Each row in the .csv file is a single data record or observation. Each column in the .csv file contains an attribute of the

observation.

Attributes: Amazon ML requires names for each attribute. You can specify attribute names by:

- Including the attribute names in the first line (also known as a header line) of the .csv file that you use as your input data

- Including the attribute names in a separate schema file that is located in the same S3 bucket as your input data

Input File Format Requirements

- The .csv file that contains your input data must meet the following requirements:

- Must be in plain text using a character set such as ASCII, Unicode, or EBCDIC.

- Consist of observations, one observation per line.

- For each observation, the attribute values must be separated by commas.

- If an attribute value contains a comma (the delimiter), the entire attribute value must be enclosed in double quotes.

- Each observation must be terminated with an end-of-line character, which is a special character or sequence of characters indicating the end of a line.

- Attribute values cannot include end-of-line characters, even if the attribute value is enclosed in double quotes.

- Every observation must have the same number of attributes and sequence of attributes.

- Each observation must be no larger than 100 KB. Amazon ML rejects any observation larger than 100 KB during processing. If Amazon ML rejects more than 10,000 observations, it rejects the entire .csv file.

5