Question

In: Computer Science

Question: 1- A database has four transactions. Let min sup = 60% and min conf =...

Question: 1- A database has four transactions. Let min sup = 60% and min conf = 80%. CID TID Items Bought 0...

1- A database has four transactions. Let min sup = 60% and min conf = 80%. CID TID Items Bought 01 T100 {King’s-Crab, Sunset-Milk, Dairyland-Cheese, Best-Bread } 02 T200 {Best-Cheese, Dairyland-Milk, Goldenfarm-Apple, Tasty-Pie, Wonder-Bread} 01 T300 {Westcoast-Apple, Dairyland-Milk, Wonder-Bread, Tasty-Pie} 03 T400 {Wonder-Bread, Sunset-Milk, Dairyland-Cheese} (a) At the granularity of item category (e.g., itemi could be “Milk”), for the following rule template, ∀X ∈ transaction, buys(X, item1) ∧ buys(X, item2) ⇒ buys(X, item3) [s, c] list the frequent k-itemset for the largest k, and all of the strong association rules (with their support s and confidence c) containing the frequent k-itemset for the largest k. (b) At the granularity of brand-item category (e.g., itemi could be “Sunset-Milk”), for the following rule template, ∀X ∈ customer, buys(X, item1) ∧ buys(X, item2) ⇒ buys(X, item3) list the frequent k-itemset for the largest k (but do not print any rules).

Solutions

Expert Solution

#Could you please leave a THUMBS Up for my work...

Given :

min. support=60%

min. confidence=80%

Transactions :

01 T100 {King’s-Crab, Sunset-Milk, Dairyland-Cheese, Best-Bread }

02 T200 {Best-Cheese, Dairyland-Milk, Goldenfarm-Apple, Tasty-Pie, Wonder-Bread}

01 T300 {Westcoast-Apple, Dairyland-Milk, Wonder-Bread, Tasty-Pie}

03 T400 {Wonder-Bread, Sunset-Milk, Dairyland-Cheese}

a)  At  ​​​the granularity of items, following transactions (items only) exist:

01 T100 {Crab, Milk, Cheese, Bread }

02 T200 {Cheese, Milk, Apple, Pie, Bread}

01 T300 {Apple, Milk, Bread, Pie}

03 T400 {Bread, Milk, Cheese}

From these transactions, following itemsets can be drawn:

1-itemsets:

{crab}[s=25%], {milk}[s=100%], {cheese}[s=75%],{bread}[s=100%], {apple}[s=50%], {pie}[s=50%]

2-itemsets:

{milk,crab}[s=25%,c=25%].{cheese,crab}[s=25%,c=33%], {bread,crab}[s=25%,c=75%], {milk, cheese}[s=75%, c=75%], {milk, bread}[s=75%,c=75%], {bread, cheese}[s=75%,c=75%], {cheese, apple}[s=25%,c=33%], {cheese, pie}[s=25%, c=33%], {milk, apple}[s=50%, c=50%\, {milk, pie}[s=50%,c=50%], {apple,pie}[s=50%,c=100%], {bread,apple}[s=50%,c=50%]

However, as per the given template rule, itemsets of interest for given database transactions are 3-itemsets. For the given set of transaction, following 3-itemsets can be formed:

{milk, cheese, crab}[s=25%, c=33%]

{milk,bread,crab}[s=25%, c=33%]

{milk,bread, cheese}[s=75%,c=100%]

{cheese,milk,apple}[s=25%,c=33%]

{cheese,apple,pie}[s=25%,c=100%]

{cheese,bread,pie}[s=25%,c=33%]

{milk,bread,apple}[s=50%,c=66%]

{apple,bread,pie}[s=50%,c=100%]

From the above 3-itemsets, the single 3-itemset that crosses minimum support and minimum confidence is:

{milk,bread, cheese}[s=75%,c=100%]

Hence, this can be termed as a frequent itemset and corresponding strong association rules are:

{milk,bread} \Rightarrow {cheese} with supprot and confidence at 75% and 100% respectively.

Here it is to be noted that support refers to percentage or count in all transactions when a combination of items was taken together. It gives the usefulness of that combination. While confidence refers to % or relative count of transaction when a particular item was also taken alongwith some other item or set of items. It tells the certainity of that pattern.

For a given association rule A \Rightarrow B, confidence=support count of (A U B)/ support count of A.

b) ​​​​​​At the granularity of brand-item, following 3-itemsets as per the given template can be identified:

For customer 1 (2 transactions):

{King's-crab,Sunset-milk,Dairyland-cheese}[s=50%]

{King's-crab, Dairyland-cheese, Best-bread}[s=50%]

{King's-crab, Sunset-milk, Best-bread} [s=50%]

{Sunset-milk, Dairyland-cheese, Best-bread}[s=50%]

{Westcoast-apple, Dairyland-milk, Wonder-bread}[s=50%]

{Westcoast-apple, Dairyland-milk, Tasty-pie}[s=50%]

{Westcoast-apple, Wonder-bread, Tasty-pie}[s=50%]

{Wonder-bread, Sunset-milk,Dairyland-cheese}[s=50%]

As minimum support threshold is 60%, it is clear no itemset for customer 1 can be termed as frequent itemset.

For customer 2 (1 transaction):

{Best-cheese, Dairyland-milk, Goldenarm-apple}

{Best-cheese,Dairyland-milk, Tasty-pie}

{Best-cheese, Dairyland-milk,Wonder-bread}

{Best-cheese, Goldenarm-apple, Tasty-pie}

{Best-cheese, Goldenarm-apple, Wonder-bread}

{Best-cheese, Tasty-pie, Wonder-bread}

{Dairyland-milk, Goldenarm-apple, Tasty-pie}

{Dairyland-milk, Tasty-pie,Wonder-bread}

{Dairyland-milk, Goldenarm-apple, Wonder-bread}

{Goldenarm-apple, Tasty-pie, Wonder-bread}

As there is only one transaction, support of each itemset will be 100% which is above minimum support threshold. On that basis, each itemset can be termed as frequent, however this information can not be said to be perfect as number of transactions under consideration is not sufficient.

For customer 3 (1 transaction): Only one 3-itemset can be mentioned:-

{Wonder-Bread, Sunset-Milk, Dairyland-Cheese}

Support for this itemset also is 100%, but as said above this information is not perfect.


Related Solutions

QUESTION 1: The Masters is one of the four major golf tournaments. Only the 60 golfers...
QUESTION 1: The Masters is one of the four major golf tournaments. Only the 60 golfers with the lowest two-round total advance to the final two rounds (unless several people are tied for 60th place, in which case all those tied for 60th place advance). Suppose that for a certain year the least-squares line for predicting second-round scores from first-round scores has equation yˆ=51.43+0.315xy^=51.43+0.315x (a) Find the predicted (±±0.001) second-round scores for a player who shot 80 in the first...
Suppose Joe has utility U = min(C/60, L) i.e. Joe must have $60 and an hour...
Suppose Joe has utility U = min(C/60, L) i.e. Joe must have $60 and an hour of leisure to get one util.. By extension $30 and 30 minutes of leisure gives him 0.5 utils and $120 and 2 hours of leisure gives him 2 utils. Further, assume that Joe can make $20/hour at his job and has absolutely no savings. Lastly… assume Joe must sleep 8 hours a day (which counts as neither work nor leisure), but can work and/or...
Let X and Y be independent and identical uniform distribution on [0, 1]. Let Z=min(X, Y)....
Let X and Y be independent and identical uniform distribution on [0, 1]. Let Z=min(X, Y). Find E[Y-Z]. Hint: condition on whether Y=Z or not. What is the probability Y=Z?
Question 1: In the NCBI database, retrieve the nucleotide sequences NG011676. Report: Gene name, database name....
Question 1: In the NCBI database, retrieve the nucleotide sequences NG011676. Report: Gene name, database name. Number of exons and its position (the start and the end of each exon). The start and the end of coding region (CDS) Accession number of protein from this gene and the length of polypeptide. Question 2: Use NG011676 to run GenScan. Report the results and compare the results with information of this gene from Question 1. Question 3: Use NG011676 to run FGENESH...
Question 1 If a company has a contribution margin ratio of 60%, the company's variable expense...
Question 1 If a company has a contribution margin ratio of 60%, the company's variable expense ratio is: Group of answer choices 100% Cannot be determined with the given information 40% 60% Question 2 To find the break-even point in units using the equation method, we set the profit equal to zero. Group of answer choices True False Question 3 The equation for target profit is the same for break-even point in dollars Group of answer choices True False Question...
SUBJECT: DATABASE SYSTEMS !!! Please answer the question as early as possible !!! Question 1 You...
SUBJECT: DATABASE SYSTEMS !!! Please answer the question as early as possible !!! Question 1 You are required to draw a complete Crow’s Foot ERD that includes the following entity relationship components based on the below descriptions: i) Identify all of the entities and its attributes. ii) Identify all possible relationships and its connectivity. iii) Identify the primary key and foreign key for each entity. iv) Identify the participation constraint and cardinality for each relationship. A Super5 company contains many...
Let x be a continuous random variable that has a normal distribution with μ = 60...
Let x be a continuous random variable that has a normal distribution with μ = 60 and σ = 12. Assuming n ≤ 0.05N, where n = sample size and N = population size, find the probability that the sample mean, x¯, for a random sample of 24 taken from this population will be between 54.91 and 61.79. Let x be a continuous random variable that has a normal distribution with μ = 60 and σ = 12. Assuming n...
QUESTION 1 Which of the following statement is CORRECT about normalisation? A database that is not...
QUESTION 1 Which of the following statement is CORRECT about normalisation? A database that is not in at least third normal form cannot be implemented with MySQL. A database that is only in first normal form would breach the CAP theorem. Normalisation ensures consistent formatting, for example, that all the phone numbers would have the same number of digits. All of the above. None of the above. 0.5 points    QUESTION 2 Complete the four blanks in the following SQL...
Define the four main types of relational constraints and use the example relational database (Figure 1)...
Define the four main types of relational constraints and use the example relational database (Figure 1) to illustrate each of these constraints. Figure 1 House(MLS, Addr, NumRooms, NumBedRooms, SellID, OfficeID, Price) Seller(SellID, Name) PotentialBuyer(BuyID, Name) REOffice(OffID, Name, Addr, Phone) Agent(AgID, OffID, Name) Showing(AgID, MLS, BuyID, Date)
For p, q ∈ S^1, the unit circle in the plane, let d_a(p, q) = min{|angle(p)...
For p, q ∈ S^1, the unit circle in the plane, let d_a(p, q) = min{|angle(p) − angle(q)| , 2π − |angle(p) − angle(q)|} where angle(z) ∈ [0, 2π) refers to the angle that z makes with the positive x-axis. Use your geometric talent to prove that d_a is a metric on S^1.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT