www.HadoopExam.com

HadoopExam Learning Resources

Question 8 : You have run the association rules algorithm on your data set, and the two rules {banana, apple} => {grape} and {apple, orange}=> {grape} have been found to be relevant. What else must be true?

1.     {grape, apple, orange} must be a frequent itemset.
2.     {banana, apple, grape, orange} must be a frequent itemset.
3.     {grape} => {banana, apple} must be a relevant rule.
4.     {banana, apple} => {orange} must be a relevant rule.

Correct Answer : 1

Exp : Many online service providers such as Amazon and Netflix use recommender systems. Recommender systems can use association rules to discover related products or identify customers who have similar interests. For example, association rules may suggest that those customers who have bought product A have also bought product B, or those customers who have bought products A, B, and C are more similar to this customer. These findings provide opportunities for retailers to cross-sell their products. Association rule mining is primarily focused on finding frequent co-occurring associations among a collection of items.  It is sometimes referred to as "Market Basket Analysis", since that was the original application area of association mining. The goal is to find associations of items that occur together more often than you would expect from a random sampling of all possibilities.  The classic example of this is the famous Beer and Diapers association that is often mentioned in data mining books. The story goes like this: men who go to the store to buy diapers will also tend to buy beer at the same time.  Let us illustrate this with a simple example.  Suppose that a store's retail transactions database includes the following information:

There are 600,000 transactions in total.
7,500 transactions contain diapers (1.25 percent)
60,000 transactions contain beer (10 percent)
6,000 transactions contain both diapers and beer (1.0 percent)
If there was no association between beer and diapers (i.e., they are statistically independent), then we expect only 10% of diaper purchasers to also buy beer (since 10% of all customers buy beer).  However, we discover that 80% (=6000/7500) of diaper purchasers also buy beer.  This is a factor of 8 increase over what was expected - that is called Lift, which is the ratio of the observed frequency of co-occurrence to the expected frequency.  This was determined simply by counting the transactions in the database.  So, in this case, the association rule would state that diaper purchasers will also buy beer with a Lift factor of 8.   In statistics, Lift is simply estimated by the ratio of the joint probability of two items x and y, divided by the product of their individual probabilities:  Lift = P(x,y)/[P(x)P(y)].  If the two items are statistically independent, then P(x,y)=P(x)P(y), corresponding to Lift = 1 in that case.  Note that anti-correlation yields Lift values less than 1, which is also an interesting discovery - corresponding to mutually exclusive items that rarely co-occur together.

You have no rights to post comments

You are here: Home EMC Certification EMC Data Science EMC Data Science Question 8 : You have run the association rules algorithm on your data set, and the two rules {banana, apple} => {grape} and {apple, orange}=> {grape} have been found to be relevant. What else must be true?