In: Computer Science
Consider the dataset shown below where the decision attribute is restaurant
Shown below is a partially developed decision tree. Finish creating the tree using the ID3 method. YOU WILL NOT RECEIVE ANY CREDIT UNLESS YOU SHOW ALL OF YOUR WORK IN TERMS OF ENTROPY AND INFORMATION GAIN CALCULATIONS!!!
Firstly, let us understand the dataset given:
Here the target/decision attribute is restaurant and the attributes used to make decision tree nodes are "ageGroup", "gender" and "married" attributes. The values of each attributes are self-explanatory being M,F for Male and Female, ageGroups divided into three classes young, middle and senior and restaurants names as given.
Secondly, coming to the ID3 [Iterative Dichotomiser 3] - the decision tree building algorithm, it divides (dichotomizes) attributes into two or more groups at each level of the decision tree. Moreover to select the attribute at each level it uses the top-down greedy approach (selecting the best attribute at each level of decision making).
The parameter to choose which is best depends on the INFORMATION GAIN for an attribute/feature. The basis to find the information gain is ENTROPY. Entropy is simply the measure of disorder. It is given by,
where n is total number of classes in the attribute, pi is probability of class i in that attribute. S being the dataset.
Now, the information gain for attribute A is given by,
where |Sk| is the number of rows in S which the attribute A has the value k. |S| is the number of rows in dataset S.
Finally, applying the ID3 on our datset to obtain the finished decision tree.
Step1: Calculate the entropy values for all the atatributes in dataset starting from the target attribute "restaurant".
The entropy of target attribute is: E(Restaurant) = 1.021
Step2: Calculate the IGs of each feature / attribute of dataset (apart from target).
The IG of ageGroup is: IG(Restaurant, ageGroup) = 0.26
The IG of gender is: IG(Restaurant, gender) = -0.254
The IG of married is: IG(Restaurant, married) = 0.536
The feature with highest IG is used to create the root node. Here "married" column is the root node as shown in the incomplete decision tree.
Step3: Now, we iteratively calculate the IG for each case from the root node until all the features are exhausted.
married = yes:
The IG of ageGroup given married = yes is: IG(Restaurantyes, ageGroup) = 0
The IG of gender given married = yes is: IG(Restaurantyes, gender) = 0
[Here since IG(Restaurantyes, ageGroup) = IG(Restaurantyes, gender) = 0, for whatever the value of gender and ageGroup, when married = yes, the restaurant is mcdonalds]
And
married = no:
The IG of ageGroup given married = no is: IG(Restaurantno, ageGroup) = 0.971
The IG of gender given married = no is: IG(Restaurantno, gender) = 0.737
[Here since IG(Restaurantno, ageGroup) > IG(Restaurantno, gender), the next node is taken as ageGroup. Then we have three edges from the ageGroup node -> young, middle, senior]
Step4: The next iteration is for the edges of next level. (Just below ageGroup)
When married= no and ageGroup= young:
The IG of gender given married=no and ageGroup= young is: IG(Restaurantno_young, gender) = 0
The IG of gender given married=no and ageGroup= middle is: IG(Restaurantno_middle, gender) = 0
The IG of gender given married=no and ageGroup= senior is: IG(Restaurantno_senior, gender) = 0
[Here since, IG(Restaurantno_young, gender) = IG(Restaurantno_middle, gender) = IG(Restaurantno_senior, gender) = 0, for what ever the values of gender the restaurant chosen is always burgerking for young and middle ageGroups and wendys for the senior ageGroup.]
-> Hence since at any given level there is no need for the gender and it can be pruned from the decision tree for this dataset.
-> As you have gussed this kind of overfitting is one of the main disadvantages of decision trees.
So the final tree would be like this:
For the calculations please check below: