Question

In: Computer Science

Describe Hunter's Algorithm for building decision trees. Build a decision out of the following ("training") dataset....

Describe Hunter's Algorithm for building decision trees. Build a decision out of the following ("training") dataset. The goal is to determine if a person is a defaulted borrower given values for the first four attributes. How do you deal with the attribute Annual Income with real values? For a person with values for the first four attributes 11, No, Single, 180K, is this person a defaulted borrower or not according to your newly built decision tree?

ID Home Owner Marital Status Annual Income Defaulted Borrower

1 Yes Single 125K No

2 No Married 100K No

3 No Single 70K No

4 Yes Married 120K No

5 No Divorced 95K Yes

6 No Married 60K No

7 Yes Divorced 220K No

8 No Single 85K Yes

9 No Married 75K No

10 No Single 90K Yes

Expert Solution

Hunter's algorithm grows a decision tree in a recursive fashion by partitioning the trainig records into successively purer subsets. Let Dt be the set of training records that reach a node t. The general recursive procedure is defined as below:

If Dt contains records that belong the same class yt, then t is a leaf node labeled as yt
If Dt is an empty set, then t is a leaf node labeled by the default class, yd
If Dt contains records that belong to more than one class, use an attribute test to split the data into smaller subsets.

It recursively applies the procedure to each subset until all the records in the subset belong to the same class. The Hunt's algirithm assumes that each combination of attribute sets has a unique class label during the procedure. If all the records associated with Dt have identical attribute values except for the class label, then it is not possible to split these records any future. In this case, the node is decalred a leaf node with the same class label as the majority class of training records associated with this node.

Top down tree construction algorithm

BuildTree(Node t,Training database D,Split Selection Method S)

step 1: Apply S to D to find splitting criterion

step 2: if(t is not a leaf node)

step 3 : create children nodes of t

step 4: partition D into children partitions

step 5 : recurse on each partition

step 6: endif

Fig:Training Data

ID	Home owner	Marital Status	Annual Income	Defaulted Borrower
1	Yes	Single	125K	No
2	No	Married	100K	No
3	No	Single	70K	No
4	Yes	Married	120K	No
5	No	Divorced	95K	Yes
6	No	Married	60K	No
7	Yes	Divorced	220K	No
8	No	Single	85K	Yes
9	No	Married	75K	No
10	No	Single	90K	Yes

Decision Tree

Qn:For a person with values for the first four attributes 11, No, Single, 180K, is this person a defaulted borrower or not according to your newly built decision tree?

Once the decision tree has been constructed, classifying a test record is straightforward. Starting from the root node, we apply the test condition to the record and follow the appropriate branch based on the outcome of the test. It then lead us either to another internal node, for which a new test condition is applied, or to a leaf node. When we reach the leaf node, the class lable associated with the leaf node is then assigned to the record, As shown in the follwoing figure it traces the path in the decision tree to predict the class label of the test record, and the path terminates at a leaf node labeled NO.

Decision Tree for Test data

venereology answered 7 months ago

The following training dataset is “reading email dataset”. This dataset has four features as follows: author,...

The following training dataset is “reading email dataset”. This dataset has four features as follows: author, thread, length, and where to read the mail. According to the features the algorithm has to predict the user’s action whether to read or skip the mail. Use Naïve Bayes classifier to predict the user’s action (skips or reads) when the author of the mail is known, the thread of the mail is follow up, the length of the mail is short, and where...

The following training dataset is “reading email dataset”. This dataset has four features as follows: author,...

The following training dataset is “reading email dataset”. This dataset has four features as follows: author, thread, length, and where to read the mail. According to the features the algorithm has to predict the user’s action whether to read or skip the mail. Use Naïve Bayes classifier to predict the user’s action (skips or reads) when the author of the mail is known, the thread of the mail is follow up, the length of the mail is short, and where...

For this problem, use the e1-p1.csv dataset. Using the decision tree algorithm that we discussed in...

For this problem, use the e1-p1.csv dataset. Using the decision tree algorithm that we discussed in the class, determine which attribute is the best attribute at the root level. You should not use Weka, JMP Pro, or any other data mining/machine learning software. You must show all intermediate results and calculations. For this problem, use the e1-p1.csv dataset. Using the decision tree algorithm that we discussed in the class, determine which attribute is the best attribute at the root level....

Consider the role of simulation analysis and decision trees in capital budgeting risk analysis. Describe the...

Consider the role of simulation analysis and decision trees in capital budgeting risk analysis. Describe the advantages offered by each technique. Describe a scenario that the technique would be appropriate to apply to and explain what you would expect to learn from application of that tool. Write your response as a one-page memo. Post your memo in the discussion forum and solicit feedback from your classmates.

Which of the following methods can achieve zero training error on any linearly separable dataset? (A)...

Which of the following methods can achieve zero training error on any linearly separable dataset? (A) Support vector machines (B) 3-Nearest Neighbor (C) Linear perceptron (D) Logistic regression Please answer with an explanation for each option on why it may or may not achieve zero training error on any linearly separable dataset.

R has many build-in dataset. The data mtcars is one of them. The following R code...

R has many build-in dataset. The data mtcars is one of them. The following R code read-in data and save the data to input. input <- mtcars[,c("am","cyl","hp","wt")] Write a few line of R code to conduct a regression analysis with am as the response variable, and cyl, hp, wt as explanation variables.

Build a Dynamic Programming algorithm in Python for the following problem. Given two sequences of vowels,...

Build a Dynamic Programming algorithm in Python for the following problem. Given two sequences of vowels, find the length of longest subsequence present in both of them. NOTE: A subsequence is a sequence that appears in the same order, but not necessarily contiguous. For example, “aei”, “aeo”, “eio”, “aiu”, ... are subsequences of “aeiou”. Sample Input 1: aeiou aiu Sample Output 1: 3 Sample Input 2: aeaiueoiuaeeoeooaauoi aeuioauuuoaeieeaueuiouaiieuiuuuaoueueauaeiauuo Sample Output 2: 16 Sample Input 3: iioioieiaiauaoeoiioiiue iuuueauiaieooaoaaouaaaae Sample Output 3:...

Apply the classification algorithm to the following set of data records. Draw a decision tree. The...

Apply the classification algorithm to the following set of data records. Draw a decision tree. The class attribute is Repeat Customer. RID Age City Gender Education Repeat Customer 101 20..30 NY F College YES 102 20..30 SF M Graduate YES 103 31..40 NY F College YES 104 51..60 NY F College NO 105 31..40 LA M High school NO 106 41..50 NY F College YES 107 41..50 NY F Graduate YES 108 20..30 LA M College YES 109 20..30 NY...

Build a two dimensional array out of the following three lists. The array will represent a...

Build a two dimensional array out of the following three lists. The array will represent a deck of cards. The values in dCardValues correspond to the card names in dCardNames. Note that when you make an array all data types must be the same. Apply dSuits to dCardValues and dCardNames by assigning a suit to each set of 13 elements. dCardNames = ['2','3','4','5','6','7','8','9','10','J','Q','K','A'] dCardValues = ['2','3','4','5','6','7','8','9','10','11','12','13','14'] dSuits = ["Clubs","Spades","Diamonds","Hearts"] Once assigned your two dimensional array should resemble this : 2...

Consider the following variants of the Towers of Hanoi. For each of variant, describe an algorithm...

Consider the following variants of the Towers of Hanoi. For each of variant, describe an algorithm to solve it in as few moves as possible. Prove that your algorithm is correct. Initially, all the n disks are on peg 1, and you need to move the disks to peg 2. In all the following variants, you are not allowed to put a bigger disk on top of a smaller disk. Consider the disappearing Tower of Hanoi puzzle where the largest...

Question

Describe Hunter's Algorithm for building decision trees. Build a decision out of the following ("training") dataset....

Solutions

Expert Solution

Related Solutions

The following training dataset is “reading email dataset”. This dataset has four features as follows: author,...

The following training dataset is “reading email dataset”. This dataset has four features as follows: author,...

For this problem, use the e1-p1.csv dataset. Using the decision tree algorithm that we discussed in...

Consider the role of simulation analysis and decision trees in capital budgeting risk analysis. Describe the...

Which of the following methods can achieve zero training error on any linearly separable dataset? (A)...

R has many build-in dataset. The data mtcars is one of them. The following R code...

Build a Dynamic Programming algorithm in Python for the following problem. Given two sequences of vowels,...

Apply the classification algorithm to the following set of data records. Draw a decision tree. The...

Build a two dimensional array out of the following three lists. The array will represent a...

Consider the following variants of the Towers of Hanoi. For each of variant, describe an algorithm...