Question

In: Computer Science

Describe Hunter's Algorithm for building decision trees. Build a decision out of the following ("training") dataset....

Describe Hunter's Algorithm for building decision trees. Build a decision out of the following ("training") dataset. The goal is to determine if a person is a defaulted borrower given values for the first four attributes. How do you deal with the attribute Annual Income with real values? For a person with values for the first four attributes 11, No, Single, 180K, is this person a defaulted borrower or not according to your newly built decision tree?

ID Home Owner               Marital Status     Annual Income    Defaulted Borrower

1             Yes                        Single                   125K                     No

2             No                         Married                100K                     No

3             No                         Single                   70K                       No

4             Yes                        Married                120K                     No

5             No                         Divorced              95K                       Yes

6             No                         Married                60K                       No

7             Yes                        Divorced              220K                     No

8             No                         Single                   85K                       Yes

9             No                         Married                75K                       No

10           No                         Single                   90K                       Yes

Solutions

Expert Solution

Hunter's algorithm grows a decision tree in a recursive fashion by partitioning the trainig records into successively purer subsets. Let Dt be the set of training records that reach a node t. The general recursive procedure is defined as below:

  1. If Dt contains records that belong the same class yt, then t is a leaf node labeled as yt
  2. If Dt is an empty set, then t is a leaf node labeled by the default class, yd
  3. If Dt contains records that belong to more than one class, use an attribute test to split the data into smaller subsets.

It recursively applies the procedure to each subset until all the records in the subset belong to the same class. The Hunt's algirithm assumes that each combination of attribute sets has a unique class label during the procedure. If all the records associated with Dt have identical attribute values except for the class label, then it is not possible to split these records any future. In this case, the node is decalred a leaf node with the same class label as the majority class of training records associated with this node.

Top down tree construction algorithm

BuildTree(Node t,Training database D,Split Selection Method S)

step 1: Apply S to D to find splitting criterion

step 2: if(t is not a leaf node)

step 3 : create children nodes of t

step 4: partition D into children partitions

step 5 : recurse on each partition

step 6: endif

Fig:Training Data

ID

Home owner

Marital Status

Annual Income

Defaulted Borrower

1

Yes

Single

125K

No

2

No

Married

100K

No

3

No

Single

70K

No

4

Yes

Married

120K

No

5

No

Divorced

95K

Yes

6

No

Married

60K

No

7

Yes

Divorced

220K

No

8

No

Single

85K

Yes

9

No

Married

75K

No

10

No

Single

90K

Yes

Decision Tree

Qn:For a person with values for the first four attributes 11, No, Single, 180K, is this person a defaulted borrower or not according to your newly built decision tree?

Once the decision tree has been constructed, classifying a test record is straightforward. Starting from the root node, we apply the test condition to the record and follow the appropriate branch based on the outcome of the test. It then lead us either to another internal node, for which a new test condition is applied, or to a leaf node. When we reach the leaf node, the class lable associated with the leaf node is then assigned to the record, As shown in the follwoing figure it traces the path in the decision tree to predict the class label of the test record, and the path terminates at a leaf node labeled NO.

Decision Tree for Test data


Related Solutions

The following training dataset is “reading email dataset”. This dataset has four features as follows: author,...
The following training dataset is “reading email dataset”. This dataset has four features as follows: author, thread, length, and where to read the mail. According to the features the algorithm has to predict the user’s action whether to read or skip the mail. Use Naïve Bayes classifier to predict the user’s action (skips or reads) when the author of the mail is known, the thread of the mail is follow up, the length of the mail is short, and where...
The following training dataset is “reading email dataset”. This dataset has four features as follows: author,...
The following training dataset is “reading email dataset”. This dataset has four features as follows: author, thread, length, and where to read the mail. According to the features the algorithm has to predict the user’s action whether to read or skip the mail. Use Naïve Bayes classifier to predict the user’s action (skips or reads) when the author of the mail is known, the thread of the mail is follow up, the length of the mail is short, and where...
For this problem, use the e1-p1.csv dataset. Using the decision tree algorithm that we discussed in...
For this problem, use the e1-p1.csv dataset. Using the decision tree algorithm that we discussed in the class, determine which attribute is the best attribute at the root level. You should not use Weka, JMP Pro, or any other data mining/machine learning software. You must show all intermediate results and calculations. For this problem, use the e1-p1.csv dataset. Using the decision tree algorithm that we discussed in the class, determine which attribute is the best attribute at the root level....
Consider the role of simulation analysis and decision trees in capital budgeting risk analysis. Describe the...
Consider the role of simulation analysis and decision trees in capital budgeting risk analysis. Describe the advantages offered by each technique. Describe a scenario that the technique would be appropriate to apply to and explain what you would expect to learn from application of that tool. Write your response as a one-page memo. Post your memo in the discussion forum and solicit feedback from your classmates.
Which of the following methods can achieve zero training error on any linearly separable dataset? (A)...
Which of the following methods can achieve zero training error on any linearly separable dataset? (A) Support vector machines (B) 3-Nearest Neighbor (C) Linear perceptron (D) Logistic regression Please answer with an explanation for each option on why it may or may not achieve zero training error on any linearly separable dataset.
R has many build-in dataset. The data mtcars is one of them. The following R code...
R has many build-in dataset. The data mtcars is one of them. The following R code read-in data and save the data to input.                   input <- mtcars[,c("am","cyl","hp","wt")]              Write a few line of R code to conduct a regression analysis with am as the response variable, and              cyl, hp, wt as explanation variables.
Build a Dynamic Programming algorithm in Python for the following problem. Given two sequences of vowels,...
Build a Dynamic Programming algorithm in Python for the following problem. Given two sequences of vowels, find the length of longest subsequence present in both of them. NOTE: A subsequence is a sequence that appears in the same order, but not necessarily contiguous. For example, “aei”, “aeo”, “eio”, “aiu”, ... are subsequences of “aeiou”. Sample Input 1: aeiou aiu Sample Output 1: 3 Sample Input 2: aeaiueoiuaeeoeooaauoi aeuioauuuoaeieeaueuiouaiieuiuuuaoueueauaeiauuo Sample Output 2: 16 Sample Input 3: iioioieiaiauaoeoiioiiue iuuueauiaieooaoaaouaaaae Sample Output 3:...
Apply the classification algorithm to the following set of data records. Draw a decision tree. The...
Apply the classification algorithm to the following set of data records. Draw a decision tree. The class attribute is Repeat Customer. RID Age City Gender Education Repeat Customer 101 20..30 NY F College YES 102 20..30 SF M Graduate YES 103 31..40 NY F College YES 104 51..60 NY F College NO 105 31..40 LA M High school NO 106 41..50 NY F College YES 107 41..50 NY F Graduate YES 108 20..30 LA M College YES 109 20..30 NY...
Build a two dimensional array out of the following three lists. The array will represent a...
Build a two dimensional array out of the following three lists. The array will represent a deck of cards. The values in dCardValues correspond to the card names in dCardNames. Note that when you make an array all data types must be the same. Apply dSuits to dCardValues and dCardNames by assigning a suit to each set of 13 elements. dCardNames = ['2','3','4','5','6','7','8','9','10','J','Q','K','A'] dCardValues = ['2','3','4','5','6','7','8','9','10','11','12','13','14'] dSuits = ["Clubs","Spades","Diamonds","Hearts"] Once assigned your two dimensional array should resemble this : 2...
Consider the following variants of the Towers of Hanoi. For each of variant, describe an algorithm...
Consider the following variants of the Towers of Hanoi. For each of variant, describe an algorithm to solve it in as few moves as possible. Prove that your algorithm is correct. Initially, all the n disks are on peg 1, and you need to move the disks to peg 2. In all the following variants, you are not allowed to put a bigger disk on top of a smaller disk. Consider the disappearing Tower of Hanoi puzzle where the largest...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT