In: Statistics and Probability
{1,2,3} {2,3,4} {3,4,5} {4,5,6} {1,3,5} {2,4,6} {1,3,4} {2,4,5} {3,5,6} {1,2,4} {2,3,5} {3,4,6}
Exercise 6.4.2: Apply Toivonen’s Algorithm to the data, with a
support threshold of 4. Take as the sample the first row of baskets:
{1,2,3}, {2,3,4}, {3,4,5}, and {4,5,6}, i.e., one-third of the file.
Our scaleddown support theshold will be 1.
(a) What are the itemsets frequent in the sample?
(b) What is the negative border?
(c) What is the outcome of the pass through the full dataset? Are
any of the itemsets in the negative border frequent in the
whole?
Answer:
By using ,given data
Here, 12 baskets are given.
Below are the steps or passes for PCY algorithm as follows.
Pass 1:
(i) Determine total number of occurences of
allitems called as count.
(ii) For every bucket,consist of items fil; : :
:;,hash all pairs to a bucket of hash table,Increment bucket count
by 1.
(iii) Determine ,L1 and the items with
the count of atleasts at the end of the pass.
(iv) Determine,buckets with count atleast s at the
ende buckets with counts at least s.
Pass 2:
(i) All the frequent items, i.e. L1. holds in
main memory.
(ii) Main memory also holds the bitmap
summarizing the results of the hashing from pass 1.
(iii) Main memory also holds a table with all the
candidate pairs and their counts.
A pair (x; y) can be a candidate in C2 only if al l of the following are true:
(a). x is in L1.
(b). y is in L1.
(c). (x; y) hashes to a frequent bucket.
(iv) We consider each basket, and each pair of its items and making the test as above.
If all three conditions meets by pair, add to its count in
memory, or make an entry for it if one
not yet exist.