In: Computer Science
Consider the following table showing multiple transactions. Find all frequent itemsets using Apriori, then list all the strong association rules knowing that min_sup count = 2, and min_conf = 60%.
TID |
Items |
T1 T2 T3 T4 T5 T6 |
A, B, D, E A, B, C C, E B, C A A, B, C |
without handwriting, pleas.
Apriori algorithm is used to find the association between two objects whether they are strongly associated or weakly associated.
Given:
min_supcount=2
min_conf=60%
Solution:
Step 1: create a table containing the count of each item set or the their support count (supcount)
Item set | supcount |
A | 4 |
B | 4 |
C | 4 |
D | 1 |
E | 2 |
Step 2: Since min_supcount=2 D will be eliminated
The reamianing itemset will be : A,B.C.E and now we will pair them and then ww will make table to count each occurence of that pair basically we will make possible subset from the reamianing item set
Item set | supcount |
A,B | 3 |
A,C | 2 |
A,E | 1 |
B,C | 3 |
B,E | 1 |
C,E | 1 |
Step 3: The pair (B,E) , ( C,E) and (A,E) will be eliminated because min_supcount is 2 and they have supcount less than 2
Now we will make triplets with the remaining itemset (A,B) , (A.C) , (B,C)
The only possible triplet is (A,B,C) because there are only three items left and it also fulfills the minimum support condition of 2
No we will figureout the association rule using the frequent itemset (A,B,C) and min_conf which is 60%
Confidence(A->B)=supcount (AUB) / supcount (A)
So the rule generation will be:
Itemset (A,B,C)
A^B -> C ( It means A and B gives C , it basically show the association between three variables)
Example of how confidence is calculated :
Confidence(A->B)=supcount (AUB) / supcount (A) assume A->(A^B) and B->C
Confidence= supcount((A^B)UC)) / supcount(A^B)
= 2 / 3
=0.66 OR 66%
Since min_supcount for (A,B,C) is 2 and the pair (A,B) has a supcount of 3
Rules | Support | Confidence |
A^B->C | 2 | supcount((A^B)UC)) / supcount(A^B)=2/3=66% |
B^C->A | 2 | supcount((B^C)UA)) / supcount(B^C)=2/3=66% |
C^A->B | 2 | supcount((C^A)UB)) / supcount(C^A)=2/2=100% |
C->A^B | 2 | supcount((A^B)UC)) / supcount(C)=2/4=50% |
A->B^C | 2 | supcount((A^B)UC)) / supcount(A)=2/4=50% |
B->C^A | 2 | supcount((A^B)UC)) / supcount(B)=2/4=50% |
Th first three rules are strong association rule as they are above min_conf which is 60% and the last three h
are weak association rule as they are below 60%
The strong association rules are:
A^B->C |
B^C->A |
C^A->B |