Question

In: Computer Science

Consider the dataset shown below where the decision attribute is restaurant


Consider the dataset shown below where the decision attribute is restaurant

image.png

Shown below is a partially developed decision tree. Finish creating the tree using the ID3 method. YOU WILL NOT RECEIVE ANY CREDIT UNLESS YOU SHOW ALL OF YOUR WORK IN TERMS OF ENTROPY AND INFORMATION GAIN CALCULATIONS!!!

image.png

Solutions

Expert Solution

Firstly, let us understand the dataset given:

Here the target/decision attribute is restaurant and the attributes used to make decision tree nodes are "ageGroup", "gender" and "married" attributes. The values of each attributes are self-explanatory being M,F for Male and Female, ageGroups divided into three classes young, middle and senior and restaurants names as given.

Secondly, coming to the ID3 [Iterative Dichotomiser 3] - the decision tree building algorithm, it divides (dichotomizes) attributes into two or more groups at each level of the decision tree. Moreover to select the attribute at each level it uses the top-down greedy approach (selecting the best attribute at each level of decision making).

The parameter to choose which is best depends on the INFORMATION GAIN for an attribute/feature. The basis to find the information gain is ENTROPY. Entropy is simply the measure of disorder. It is given by,

where n is total number of classes in the attribute, pi is probability of class i in that attribute. S being the dataset.

Now, the information gain for attribute A is given by,

where |Sk| is the number of rows in S which the attribute A has the value k. |S| is the number of rows in dataset S.

Finally, applying the ID3 on our datset to obtain the finished decision tree.

Step1: Calculate the entropy values for all the atatributes in dataset starting from the target attribute "restaurant".

The entropy of target attribute is: E(Restaurant) = 1.021

Step2: Calculate the IGs of each feature / attribute of dataset (apart from target).

The IG of ageGroup is: IG(Restaurant, ageGroup) = 0.26

The IG of gender is: IG(Restaurant, gender) = -0.254

The IG of married is: IG(Restaurant, married) = 0.536

The feature with highest IG is used to create the root node. Here "married" column is the root node as shown in the incomplete decision tree.

Step3: Now, we iteratively calculate the IG for each case from the root node until all the features are exhausted.

married = yes:

The IG of ageGroup given married = yes is: IG(Restaurantyes, ageGroup) = 0

The IG of gender given married = yes is: IG(Restaurantyes, gender) = 0

[Here since IG(Restaurantyes, ageGroup) = IG(Restaurantyes, gender) = 0, for whatever the value of gender and ageGroup, when married = yes, the restaurant is mcdonalds]

And

married = no:

The IG of ageGroup given married = no is: IG(Restaurantno, ageGroup) = 0.971

The IG of gender given married = no is: IG(Restaurantno, gender) = 0.737

[Here since IG(Restaurantno, ageGroup) > IG(Restaurantno, gender), the next node is taken as ageGroup. Then we have three edges from the ageGroup node -> young, middle, senior]

Step4: The next iteration is for the edges of next level. (Just below ageGroup)

When married= no and ageGroup= young:

The IG of gender given married=no and ageGroup= young is: IG(Restaurantno_young, gender) = 0

The IG of gender given married=no and ageGroup= middle is: IG(Restaurantno_middle, gender) = 0

The IG of gender given married=no and ageGroup= senior is: IG(Restaurantno_senior, gender) = 0

[Here since, IG(Restaurantno_young, gender) = IG(Restaurantno_middle, gender) = IG(Restaurantno_senior, gender) = 0, for what ever the values of gender the restaurant chosen is always burgerking for young and middle ageGroups and wendys for the senior ageGroup.]

-> Hence since at any given level there is no need for the gender and it can be pruned from the decision tree for this dataset.

-> As you have gussed this kind of overfitting is one of the main disadvantages of decision trees.

So the final tree would be like this:

For the calculations please check below:


Related Solutions

Consider the circuit shown in the figure below where L = 4.80 mH and R2 =...
Consider the circuit shown in the figure below where L = 4.80 mH and R2 = 420 ?. Consider the circuit shown in the figure below whereL=4.80mHandR2=420?. (a) When the switch is in position a, for what value of R1 will the circuit have a time constant of 15.1 s k? (b) What is the current in the inductor at the instant the switch is thrown to position b? 
Problem 1. Using the INVOICE table structure shown in table below do the following: INVOICE Attribute...
Problem 1. Using the INVOICE table structure shown in table below do the following: INVOICE Attribute Name Sample Value Sample Value Sample Value Sample Value Sample Value INV_NUM 211347 211347 211347 211348 211349 PROD_NUM AA-E3422QW QD-300932X RU-995748G AA-E3422QW GH-778345P SALE_DATE 15-Jan-2016 15-Jan-2016 15-Jan-2016 15-Jan-2016 16-Jan-2016 PROD_LABEL Rotary sander 0.25-in. drill bit Band saw Rotary sander Power drill VEND_CODE 211 211 309 211 157 VEND_NAME NeverFail, Inc. NeverFail, Inc. BeGood, Inc. NeverFail, Inc. ToughGo, Inc. QUANT_SOLD 1 8 1 2 1...
Import the RestaurantRating1 dataset in R and save the resulting data frame. RestaurantRating1 is shown below...
Import the RestaurantRating1 dataset in R and save the resulting data frame. RestaurantRating1 is shown below as a table. Use some of the data wrangling techniques to transform the dataset into a tidy data. Use glimpse() function to show the resulting dataframe. Donalds Fila King Payes Wendi 1 3 1 1 1 2 3 1 1 2 2 3 1 2 2 3 3 1 2 2 3 3 1 3 3 3 3 5 3 3 3 3 5...
Consider Dataset A for answering the questions that follows below. a. Calculate the measures of central...
Consider Dataset A for answering the questions that follows below. a. Calculate the measures of central tendencies for Variable X and Variable Y. i. Mean ii. Median iii. Mode iv. Midrange v. What can you say about the skewness of X and Y variables? b. Calculate the measures of variations for Variable X and Variable Y. i. Range ii. Variance iii. Standard Deviation iv. Coefficient of Variation v. Which is more variable, X or Y? Why? c. Calculate the measures...
Consider the system of capacitors shown in the figure below
Consider the system of capacitors shown in the figure below (C1 = 4.00 μF,C2 = 2.00 μF). (a) Find the equivalent capacitance of the system.  (b) Find the charge on each capacitor.  (c) Find the potential difference across each capacitor (d) Find the total energy stored by the group.
Would you please demonstrate to me how to create dataset A and dataset B, where dataset...
Would you please demonstrate to me how to create dataset A and dataset B, where dataset A has a larger range but smaller standard deviation than dataset B. Then the reverse where data set A has a smaller range and larger standard deviation than data set B.
Part 1. Consider the dataset below. You will perform a series of regressions and data transformations....
Part 1. Consider the dataset below. You will perform a series of regressions and data transformations. Be sure to keep a record of all your computer results. First, please perform a simple linear regression. Predict Y if X = 40. To avoid rounding errors in ALL your calculations, please perform your calculations on your spreadsheet referencing data from your regression output. X Y 54 6 42 16 28 33 38 18 25 41 70 3 48 10 41 14 20...
Consider Dataset C for answering the questions that follows below. Teams A, B and C have...
Consider Dataset C for answering the questions that follows below. Teams A, B and C have been used to serve as respondents in a recently concluded webinar in Cybercrime to evaluate the delivery of the webinar. Is there any reason to believe that the mean responses of the three teams are different from one another? Test this using a level of significance of 0.05. All the teams are being categorized as either Male or Female. In this scenario, can we...
Consider Dataset D for answering the questions that follows below. The median marks for Course X...
Consider Dataset D for answering the questions that follows below. The median marks for Course X and Y for the past 8 semesters were given on the dataset. Determine the strength of relationship between Course X and Course Y by calculating the correlation coefficient between them. What can you say about their relationship? Calculate the regression line that best explain the relationship between the dependent variable Course Y and independent variable course X. Estimate the most likely value for Course...
Consider the following dataset where the target feature is “Run”. Weather Mood Breezy Run Hot Mixed...
Consider the following dataset where the target feature is “Run”. Weather Mood Breezy Run Hot Mixed feeling No No Hot Happy Yes Yes Warm Happy Yes Yes Warm Sad No Yes Warm Mixed feeling No No Hot Happy No No Cold Happy No Yes (i) On what feature should you split on first, using Information Gain? 8 (ii) Draw the decision tree at this stage with the above selected root node
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT