Question

In: Computer Science

Apply the classification algorithm to the following set of data records. Draw a decision tree. The...

Apply the classification algorithm to the following set of data records. Draw a decision tree. The class attribute is Repeat Customer.

RID

Age

City

Gender

Education

Repeat Customer

101

20..30

NY

F

College

YES

102

20..30

SF

M

Graduate

YES

103

31..40

NY

F

College

YES

104

51..60

NY

F

College

NO

105

31..40

LA

M

High school

NO

106

41..50

NY

F

College

YES

107

41..50

NY

F

Graduate

YES

108

20..30

LA

M

College

YES

109

20..30

NY

F

High school

NO

110

20..30

NY

F

college

YES

Solutions

Expert Solution

We start by computing the entropy for the entire set. We have 7 positive samples and 3 negative samples.

       The entropy, I(7,3), is -(7/10 * log (7/10) + 3/10 * log(3/10)) = 0.88

       We consider the first attribute AGE. There are 4 values for age 20..30 appears 5 times

           I(s11, s21) = -(4/5 * log(4/5) + 1/5 * log(1/5)) = 0.72

         31..40 appears 2 times

           I(s12, s22) = -(1/2 * log(1/2) + 1/2 * log(1/2)) = 1

         41..50 appears 2 times

           I(s13, s23) = -(2/2 * log(2/2) = 0

         51..60 appears 1 time

           I(s14, s24) = -(1/1 * log(1/1) = 0

         E(AGE) = 5/10 * 0.72 + 2/10 * 1 + 2/10 * 0 + 1/10 * 0 = 0.56

         GAIN(AGE) = 0.88 - 0.56 = 0.32  

       We consider the second attribute CITY. There are 3 values for city LA occurs 2 times     

           I(s11, s21) = -(1/2 * log(1/2) + 1/2 * log(1/2)) = 1    

         NY occurs 7 times     

           I(s12, s22) = -(2/7 * log(2/7) + 5/7 * log(5/7)) = 0.86

         SF occurs 1 times

           I(s13, s23) = -(1/1 * log(1/1) = 0

         E(CITY) = 2/10 * 1 + 7/10 * 0.86 + 1/10 * 0 = 0.80  

GAIN(CITY) = 0.88 - 0.80 = 0.08        

       We consider the third attribute GENDER. There are 2 values F occurs 7 times     

           I(s11, s21) = -(2/7 * log(2/7) + 5/7 * log(5/7)) = 0.86

         M occurs 3 times     

           I(s12, s22) = -(1/3 * log(1/3) + 2/3 * log(2/3)) = 0.92

         E(GENDER) = 0.88  

         GAIN(GENDER) = 0       

       We consider the fourth attribute of EDUCATION. There are 3 values HS occurs 2 times     

           I(s11, s21) = -(2/2 * log(2/2) = 0

         COLLEGE occurs 6 times     

           I(s12, s22) = -(1/6 * log(1/6) + 5/6 * log(5/6)) = 0.65

         GRAD occurs 2 times

           I(s13, s23) = -(2/2 * log(2/2) = 0

         E(EDUCATION) = 0.39  

         GAIN(EDUCATION) = 0.49

The greatest gain is for the EDUCATION attribute.
The tree at this point would look like the following:

-------------------
| EDUCATION |   
-------------------
/ | \
HS / COLLEGE | \ GRAD   
/ | \

RIDS: {105,109} {101,103,104, {102,107}
same class: NO 106,108,110} same class: YES

Only the middle node is not a LEAF node, so continue with
those records and consider only the remaining attributes.
The entropy, I(5,1), is -(5/6* log (5/6) + 1/6 * log(1/6)) = 0.65

We consider the first attribute AGE. There are 4 values for age 20..30 appears 3 times
I(s11, s21) = -(3/3 * log(3/3) = 0   
31..40 appears 1 time
I(s12, s22) = -(1/1 * log(1/1) = 0
41..50 appears 1 time
I(s13, s23) = -(1/1 * log(1/1) = 0
51..60 appears 1 time
I(s14, s24) = -(1/1 * log(1/1) = 0

E(AGE) = 0   
GAIN(AGE) = 0.65

We consider the second attribute CITY. There are 2 values for city NY occurs 1 time   
I(s11, s21) = -(1/1 * log(1/1) = 0
SF occurs 5 times
I(s12, s22) = -(1/5 * log(1/5) + 4/5 * log(4/5)) = 0.72

E(CITY) = 0.60   
GAIN(CITY) = 0.05

We consider the third attribute GENDER. There are 2 values F occurs 5 times
I(s11, s21) = -(1/5 * log(1/5) + 4/5 * log(4/5)) = 0.72
M occurs 1 time   
I(s12, s22) = -(1/1 * log(1/1) = 0

E(GENDER) = 0.60   
GAIN(GENDER) = 0.05

The greatest gain is for the AGE attribute.
The tree at this point would look like the following and we are finished.

----------------------
| EDUCATION |   
----------------------
/ | \
HS / COLLEGE | \ GRAD   
/ | \
----------------
RIDS: {105,109} | AGE | {102,107}
same class: NO ---------------- same class: YES
/ / | \
/ / | \
20..30 / /31..40 |41..50 \ 51..60
{101,108,110} {103} {106} {104}   
same class: YES YES YES NO


Related Solutions

draw statistics decision tree with 15 tests
draw statistics decision tree with 15 tests
Which best describes how a decision tree performs classification on a given example? A) Starting at...
Which best describes how a decision tree performs classification on a given example? A) Starting at root, evaluate the binary decision to determine which tree branch to go to next. Stop when leaf node is reached. B) Starting at root, select the tree branch that maximizes information gain. Stop when lead node is reached. C) Starting at root, evaluate the binary decision to determine which tree branch to go to next. Stop when entropy is 0. D) Starting at root,...
In a decision tree, how does the algorithm pick the attributes for splitting? Would you explain...
In a decision tree, how does the algorithm pick the attributes for splitting? Would you explain it logically and specifically?
Q1. Assume the complete binary tree numbering scheme used by Heapsort and apply the Heapsort algorithm...
Q1. Assume the complete binary tree numbering scheme used by Heapsort and apply the Heapsort algorithm to the following key sequence (3,25,9, 35,10,13,1,7). The first element index is equal 1. What value is in location 5 of the initial HEAP? After a single deletion (of the parameter at the heap root) and tree restructuring, what value is in location 5 of the new HEAP?
For this problem, use the e1-p1.csv dataset. Using the decision tree algorithm that we discussed in...
For this problem, use the e1-p1.csv dataset. Using the decision tree algorithm that we discussed in the class, determine which attribute is the best attribute at the root level. You should not use Weka, JMP Pro, or any other data mining/machine learning software. You must show all intermediate results and calculations. For this problem, use the e1-p1.csv dataset. Using the decision tree algorithm that we discussed in the class, determine which attribute is the best attribute at the root level....
Draw the binomial tree listing only the option prices at each node. Assume the following data...
Draw the binomial tree listing only the option prices at each node. Assume the following data on a 6-month call option, using 3-month intervals as the time period. K = $40, S = $37.90, r = 5.0%, σ = 0.35
The following is a set of tree-set test programs that show the following outputs: Switch to...
The following is a set of tree-set test programs that show the following outputs: Switch to ArrayList, LinkedList, Vector, TreeMap, and HashMap to display similar output results. Results: Tree set example! Treeset data: 12 34 45 63 Treeset Size: 4 First data: 12 Last Data: 63 Removing data from a tree set Current tree set elements: 12 34 63 Current tree set size :3 Tree set empty. Example code import java.util.Iterator; import java.util.TreeSet; public class TreeDemo2 { public static void...
.  Draw a plot of the following set of data and determine the linear regression equation.  What is...
.  Draw a plot of the following set of data and determine the linear regression equation.  What is the      value of the slope and intercept?   What is r and R2?  Are there any outlier values?   (15 points)                                 Age (X):     20  25  36  29  41  35  56  43  66  50  59  67  51  75  75  81  54  66  52  48            Total Body Water (Y):     61  57  52  59  53  58  48  51  37  44  42  41  48  38  41  39  47  42  51  50  
For the following data set: a) draw a scatter diagram, b) develop the estimation equation that...
For the following data set: a) draw a scatter diagram, b) develop the estimation equation that best describes the data, c) predict Y for X = 10, 15, 20. X 13 ,16 ,14 ,11 ,17 ,9 ,13 ,17 ,18, 12 Y 6.2 ,8.6 ,7.2 ,4.5 ,9.0 ,3.5 ,6.5 ,9.3 ,9.5, 5.7 Using the data given below, a) draw the scatterplot, b) develop the estimation equation that best describes the data, c) predict Y for X = 5, 6, 7. X...
Course: DSA Data Structure and Algorithm What is an AVL Tree, what are its different types...
Course: DSA Data Structure and Algorithm What is an AVL Tree, what are its different types of rotation, Illustrate with examples.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT