In: Computer Science
Describe Hunter's Algorithm for building decision trees. Build a decision out of the following ("training") dataset. The goal is to determine if a person is a defaulted borrower given values for the first four attributes. How do you deal with the attribute Annual Income with real values? For a person with values for the first four attributes 11, No, Single, 180K, is this person a defaulted borrower or not according to your newly built decision tree?
ID Home Owner Marital Status Annual Income Defaulted Borrower
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
Hunter's algorithm grows a decision tree in a recursive fashion by partitioning the trainig records into successively purer subsets. Let Dt be the set of training records that reach a node t. The general recursive procedure is defined as below:
It recursively applies the procedure to each subset until all the records in the subset belong to the same class. The Hunt's algirithm assumes that each combination of attribute sets has a unique class label during the procedure. If all the records associated with Dt have identical attribute values except for the class label, then it is not possible to split these records any future. In this case, the node is decalred a leaf node with the same class label as the majority class of training records associated with this node.
Top down tree construction algorithm
BuildTree(Node t,Training database D,Split Selection Method S)
step 1: Apply S to D to find splitting criterion
step 2: if(t is not a leaf node)
step 3 : create children nodes of t
step 4: partition D into children partitions
step 5 : recurse on each partition
step 6: endif
Fig:Training Data
ID |
Home owner |
Marital Status |
Annual Income |
Defaulted Borrower |
1 |
Yes |
Single |
125K |
No |
2 |
No |
Married |
100K |
No |
3 |
No |
Single |
70K |
No |
4 |
Yes |
Married |
120K |
No |
5 |
No |
Divorced |
95K |
Yes |
6 |
No |
Married |
60K |
No |
7 |
Yes |
Divorced |
220K |
No |
8 |
No |
Single |
85K |
Yes |
9 |
No |
Married |
75K |
No |
10 |
No |
Single |
90K |
Yes |
Decision Tree
Qn:For a person with values for the first four attributes 11, No, Single, 180K, is this person a defaulted borrower or not according to your newly built decision tree?
Once the decision tree has been constructed, classifying a test record is straightforward. Starting from the root node, we apply the test condition to the record and follow the appropriate branch based on the outcome of the test. It then lead us either to another internal node, for which a new test condition is applied, or to a leaf node. When we reach the leaf node, the class lable associated with the leaf node is then assigned to the record, As shown in the follwoing figure it traces the path in the decision tree to predict the class label of the test record, and the path terminates at a leaf node labeled NO.
Decision Tree for Test data