Question

In: Computer Science

Apply the classification algorithm to the following set of data records. Draw a decision tree. The...

Apply the classification algorithm to the following set of data records. Draw a decision tree. The class attribute is Repeat Customer.

RID	Age	City	Gender	Education	Repeat Customer
101	20..30	NY	F	College	YES
102	20..30	SF	M	Graduate	YES
103	31..40	NY	F	College	YES
104	51..60	NY	F	College	NO
105	31..40	LA	M	High school	NO
106	41..50	NY	F	College	YES
107	41..50	NY	F	Graduate	YES
108	20..30	LA	M	College	YES
109	20..30	NY	F	High school	NO
110	20..30	NY	F	college	YES

Expert Solution

We start by computing the entropy for the entire set. We have 7 positive samples and 3 negative samples.

The entropy, I(7,3), is -(7/10 * log (7/10) + 3/10 * log(3/10)) = 0.88

We consider the first attribute AGE. There are 4 values for age 20..30 appears 5 times

I(s11, s21) = -(4/5 * log(4/5) + 1/5 * log(1/5)) = 0.72

31..40 appears 2 times

I(s12, s22) = -(1/2 * log(1/2) + 1/2 * log(1/2)) = 1

41..50 appears 2 times

I(s13, s23) = -(2/2 * log(2/2) = 0

51..60 appears 1 time

I(s14, s24) = -(1/1 * log(1/1) = 0

E(AGE) = 5/10 * 0.72 + 2/10 * 1 + 2/10 * 0 + 1/10 * 0 = 0.56

GAIN(AGE) = 0.88 - 0.56 = 0.32

We consider the second attribute CITY. There are 3 values for city LA occurs 2 times

I(s11, s21) = -(1/2 * log(1/2) + 1/2 * log(1/2)) = 1

NY occurs 7 times

I(s12, s22) = -(2/7 * log(2/7) + 5/7 * log(5/7)) = 0.86

SF occurs 1 times

I(s13, s23) = -(1/1 * log(1/1) = 0

E(CITY) = 2/10 * 1 + 7/10 * 0.86 + 1/10 * 0 = 0.80

GAIN(CITY) = 0.88 - 0.80 = 0.08

We consider the third attribute GENDER. There are 2 values F occurs 7 times

I(s11, s21) = -(2/7 * log(2/7) + 5/7 * log(5/7)) = 0.86

M occurs 3 times

I(s12, s22) = -(1/3 * log(1/3) + 2/3 * log(2/3)) = 0.92

E(GENDER) = 0.88

GAIN(GENDER) = 0

We consider the fourth attribute of EDUCATION. There are 3 values HS occurs 2 times

I(s11, s21) = -(2/2 * log(2/2) = 0

COLLEGE occurs 6 times

I(s12, s22) = -(1/6 * log(1/6) + 5/6 * log(5/6)) = 0.65

GRAD occurs 2 times

I(s13, s23) = -(2/2 * log(2/2) = 0

E(EDUCATION) = 0.39

GAIN(EDUCATION) = 0.49

The greatest gain is for the EDUCATION attribute.
The tree at this point would look like the following:

-------------------
| EDUCATION |
-------------------
/ | \
HS / COLLEGE | \ GRAD
/ | \

RIDS: {105,109} {101,103,104, {102,107}
same class: NO 106,108,110} same class: YES

Only the middle node is not a LEAF node, so continue with
those records and consider only the remaining attributes.
The entropy, I(5,1), is -(5/6* log (5/6) + 1/6 * log(1/6)) = 0.65

We consider the first attribute AGE. There are 4 values for age 20..30 appears 3 times
I(s11, s21) = -(3/3 * log(3/3) = 0
31..40 appears 1 time
I(s12, s22) = -(1/1 * log(1/1) = 0
41..50 appears 1 time
I(s13, s23) = -(1/1 * log(1/1) = 0
51..60 appears 1 time
I(s14, s24) = -(1/1 * log(1/1) = 0

E(AGE) = 0
GAIN(AGE) = 0.65

We consider the second attribute CITY. There are 2 values for city NY occurs 1 time
I(s11, s21) = -(1/1 * log(1/1) = 0
SF occurs 5 times
I(s12, s22) = -(1/5 * log(1/5) + 4/5 * log(4/5)) = 0.72

E(CITY) = 0.60
GAIN(CITY) = 0.05

We consider the third attribute GENDER. There are 2 values F occurs 5 times
I(s11, s21) = -(1/5 * log(1/5) + 4/5 * log(4/5)) = 0.72
M occurs 1 time
I(s12, s22) = -(1/1 * log(1/1) = 0

E(GENDER) = 0.60
GAIN(GENDER) = 0.05

The greatest gain is for the AGE attribute.
The tree at this point would look like the following and we are finished.

----------------------
| EDUCATION |
----------------------
/ | \
HS / COLLEGE | \ GRAD
/ | \
----------------
RIDS: {105,109} | AGE | {102,107}
same class: NO ---------------- same class: YES
/ / | \
/ / | \
20..30 / /31..40 |41..50 \ 51..60
{101,108,110} {103} {106} {104}
same class: YES YES YES NO

venereology answered 8 months ago

draw statistics decision tree with 15 tests

Which best describes how a decision tree performs classification on a given example? A) Starting at...

Which best describes how a decision tree performs classification on a given example? A) Starting at root, evaluate the binary decision to determine which tree branch to go to next. Stop when leaf node is reached. B) Starting at root, select the tree branch that maximizes information gain. Stop when lead node is reached. C) Starting at root, evaluate the binary decision to determine which tree branch to go to next. Stop when entropy is 0. D) Starting at root,...

In a decision tree, how does the algorithm pick the attributes for splitting? Would you explain...

In a decision tree, how does the algorithm pick the attributes for splitting? Would you explain it logically and specifically?

Q1. Assume the complete binary tree numbering scheme used by Heapsort and apply the Heapsort algorithm...

Q1. Assume the complete binary tree numbering scheme used by Heapsort and apply the Heapsort algorithm to the following key sequence (3,25,9, 35,10,13,1,7). The first element index is equal 1. What value is in location 5 of the initial HEAP? After a single deletion (of the parameter at the heap root) and tree restructuring, what value is in location 5 of the new HEAP?

For this problem, use the e1-p1.csv dataset. Using the decision tree algorithm that we discussed in...

For this problem, use the e1-p1.csv dataset. Using the decision tree algorithm that we discussed in the class, determine which attribute is the best attribute at the root level. You should not use Weka, JMP Pro, or any other data mining/machine learning software. You must show all intermediate results and calculations. For this problem, use the e1-p1.csv dataset. Using the decision tree algorithm that we discussed in the class, determine which attribute is the best attribute at the root level....

Draw the binomial tree listing only the option prices at each node. Assume the following data...

Draw the binomial tree listing only the option prices at each node. Assume the following data on a 6-month call option, using 3-month intervals as the time period. K = $40, S = $37.90, r = 5.0%, σ = 0.35

The following is a set of tree-set test programs that show the following outputs: Switch to...

The following is a set of tree-set test programs that show the following outputs: Switch to ArrayList, LinkedList, Vector, TreeMap, and HashMap to display similar output results. Results: Tree set example! Treeset data: 12 34 45 63 Treeset Size: 4 First data: 12 Last Data: 63 Removing data from a tree set Current tree set elements: 12 34 63 Current tree set size :3 Tree set empty. Example code import java.util.Iterator; import java.util.TreeSet; public class TreeDemo2 { public static void...

. Draw a plot of the following set of data and determine the linear regression equation. What is...

. Draw a plot of the following set of data and determine the linear regression equation. What is the value of the slope and intercept? What is r and R2? Are there any outlier values? (15 points) Age (X): 20 25 36 29 41 35 56 43 66 50 59 67 51 75 75 81 54 66 52 48 Total Body Water (Y): 61 57 52 59 53 58 48 51 37 44 42 41 48 38 41 39 47 42 51 50

For the following data set: a) draw a scatter diagram, b) develop the estimation equation that...

For the following data set: a) draw a scatter diagram, b) develop the estimation equation that best describes the data, c) predict Y for X = 10, 15, 20. X 13 ,16 ,14 ,11 ,17 ,9 ,13 ,17 ,18, 12 Y 6.2 ,8.6 ,7.2 ,4.5 ,9.0 ,3.5 ,6.5 ,9.3 ,9.5, 5.7 Using the data given below, a) draw the scatterplot, b) develop the estimation equation that best describes the data, c) predict Y for X = 5, 6, 7. X...

DO IT! 6-2: The accounting records of Old Towne Electronics show the following data. Apply rules...

DO IT! 6-2: The accounting records of Old Towne Electronics show the following data. Apply rules of ownership to determine inventory cost. Beginning inventory: 3,000 units at $5 Purchases: 8,000 units at $7 Sales: 9,400 units at $10 Determine cost of goods sold during the period under a periodic inventory system using (a) the FIFO method, (b) the LIFO method, and (c) the average-cost method. (Round unit cost to nearest tenth of a cent.)