Question 32: You are working with a company which provides email solution named as www.WahooMail.com and they want to implement features which can label each email from following three categories as below

- Important

- Social

- Marketing and announcements

You just started this projects and creating a training data set for that. Hence, you selected 1, 00,000 existing emails. You need to assign proper label to this training data set, so that these all email can be used

to train your model, which can intelligently assign label to incoming email. Which of the following is an ideal solution for the given requirement?

A. You will be asking your team to manually assign label to all these 1, 00,000 emails.

B. You will be running a sentiment analysis using NLP library to determine the labels from the given three labels.

C. You will be using AWS Mechanical Turk web service to publish Human Intelligence Tasks that ask Turk workers to label these emails.

D. You will be using Monte-Carlo simulation to assign and generate labels.

E. You will be using MapReduce to classify the emails in three categories. Based on the words appear in email.

1. A,B

2. A,C

3. C,D

4. D,E

5. A,E

Correct Answer : 2 Exp : As you can see in the question, we need to find in which category each individual email falls. Means assigning appropriate label to each email. In the question it is mentioned that, it is a

Training Data set for new model, you will be implementing for the future emails. Hence, accuracy should be 100% or highest. If your training data set itself is wrong then future emails can be wrongly labelled. So we

would recommend manual process is more accurate in this case. Hence, option A and C are the correct one. Other option can introduce error.

2