In: Computer Science
Question: 1- A database has four transactions. Let min sup = 60% and min conf = 80%. CID TID Items Bought 0...
1- A database has four transactions. Let min sup = 60% and min conf = 80%. CID TID Items Bought 01 T100 {King’s-Crab, Sunset-Milk, Dairyland-Cheese, Best-Bread } 02 T200 {Best-Cheese, Dairyland-Milk, Goldenfarm-Apple, Tasty-Pie, Wonder-Bread} 01 T300 {Westcoast-Apple, Dairyland-Milk, Wonder-Bread, Tasty-Pie} 03 T400 {Wonder-Bread, Sunset-Milk, Dairyland-Cheese} (a) At the granularity of item category (e.g., itemi could be “Milk”), for the following rule template, ∀X ∈ transaction, buys(X, item1) ∧ buys(X, item2) ⇒ buys(X, item3) [s, c] list the frequent k-itemset for the largest k, and all of the strong association rules (with their support s and confidence c) containing the frequent k-itemset for the largest k. (b) At the granularity of brand-item category (e.g., itemi could be “Sunset-Milk”), for the following rule template, ∀X ∈ customer, buys(X, item1) ∧ buys(X, item2) ⇒ buys(X, item3) list the frequent k-itemset for the largest k (but do not print any rules).
#Could you please leave a THUMBS Up for my work...
Given :
min. support=60%
min. confidence=80%
Transactions :
01 T100 {King’s-Crab, Sunset-Milk, Dairyland-Cheese, Best-Bread }
02 T200 {Best-Cheese, Dairyland-Milk, Goldenfarm-Apple, Tasty-Pie, Wonder-Bread}
01 T300 {Westcoast-Apple, Dairyland-Milk, Wonder-Bread, Tasty-Pie}
03 T400 {Wonder-Bread, Sunset-Milk, Dairyland-Cheese}
a) At the granularity of items, following transactions (items only) exist:
01 T100 {Crab, Milk, Cheese, Bread }
02 T200 {Cheese, Milk, Apple, Pie, Bread}
01 T300 {Apple, Milk, Bread, Pie}
03 T400 {Bread, Milk, Cheese}
From these transactions, following itemsets can be drawn:
1-itemsets:
{crab}[s=25%], {milk}[s=100%], {cheese}[s=75%],{bread}[s=100%], {apple}[s=50%], {pie}[s=50%]
2-itemsets:
{milk,crab}[s=25%,c=25%].{cheese,crab}[s=25%,c=33%], {bread,crab}[s=25%,c=75%], {milk, cheese}[s=75%, c=75%], {milk, bread}[s=75%,c=75%], {bread, cheese}[s=75%,c=75%], {cheese, apple}[s=25%,c=33%], {cheese, pie}[s=25%, c=33%], {milk, apple}[s=50%, c=50%\, {milk, pie}[s=50%,c=50%], {apple,pie}[s=50%,c=100%], {bread,apple}[s=50%,c=50%]
However, as per the given template rule, itemsets of interest for given database transactions are 3-itemsets. For the given set of transaction, following 3-itemsets can be formed:
{milk, cheese, crab}[s=25%, c=33%]
{milk,bread,crab}[s=25%, c=33%]
{milk,bread, cheese}[s=75%,c=100%]
{cheese,milk,apple}[s=25%,c=33%]
{cheese,apple,pie}[s=25%,c=100%]
{cheese,bread,pie}[s=25%,c=33%]
{milk,bread,apple}[s=50%,c=66%]
{apple,bread,pie}[s=50%,c=100%]
From the above 3-itemsets, the single 3-itemset that crosses minimum support and minimum confidence is:
{milk,bread, cheese}[s=75%,c=100%]
Hence, this can be termed as a frequent itemset and corresponding strong association rules are:
{milk,bread} \Rightarrow {cheese} with supprot and confidence at 75% and 100% respectively.
Here it is to be noted that support refers to percentage or count in all transactions when a combination of items was taken together. It gives the usefulness of that combination. While confidence refers to % or relative count of transaction when a particular item was also taken alongwith some other item or set of items. It tells the certainity of that pattern.
For a given association rule A \Rightarrow B, confidence=support count of (A U B)/ support count of A.
b) At the granularity of brand-item, following 3-itemsets as per the given template can be identified:
For customer 1 (2 transactions):
{King's-crab,Sunset-milk,Dairyland-cheese}[s=50%]
{King's-crab, Dairyland-cheese, Best-bread}[s=50%]
{King's-crab, Sunset-milk, Best-bread} [s=50%]
{Sunset-milk, Dairyland-cheese, Best-bread}[s=50%]
{Westcoast-apple, Dairyland-milk, Wonder-bread}[s=50%]
{Westcoast-apple, Dairyland-milk, Tasty-pie}[s=50%]
{Westcoast-apple, Wonder-bread, Tasty-pie}[s=50%]
{Wonder-bread, Sunset-milk,Dairyland-cheese}[s=50%]
As minimum support threshold is 60%, it is clear no itemset for customer 1 can be termed as frequent itemset.
For customer 2 (1 transaction):
{Best-cheese, Dairyland-milk, Goldenarm-apple}
{Best-cheese,Dairyland-milk, Tasty-pie}
{Best-cheese, Dairyland-milk,Wonder-bread}
{Best-cheese, Goldenarm-apple, Tasty-pie}
{Best-cheese, Goldenarm-apple, Wonder-bread}
{Best-cheese, Tasty-pie, Wonder-bread}
{Dairyland-milk, Goldenarm-apple, Tasty-pie}
{Dairyland-milk, Tasty-pie,Wonder-bread}
{Dairyland-milk, Goldenarm-apple, Wonder-bread}
{Goldenarm-apple, Tasty-pie, Wonder-bread}
As there is only one transaction, support of each itemset will be 100% which is above minimum support threshold. On that basis, each itemset can be termed as frequent, however this information can not be said to be perfect as number of transactions under consideration is not sufficient.
For customer 3 (1 transaction): Only one 3-itemset can be mentioned:-
{Wonder-Bread, Sunset-Milk, Dairyland-Cheese}
Support for this itemset also is 100%, but as said above this information is not perfect.