Question

In: Computer Science

Explain the concept of information gain in the Classification Algorithms. How is it used to develop...

Explain the concept of information gain in the Classification Algorithms. How is it used to develop a decision tree?

Solutions

Expert Solution

Before describing information gain, please understand the concept of entropy. Entropy is the measurement of how much data is homogeneous and how much heterogeneous. Lower the value of entropy, more the data distribution is homogeneous. And more the data is homogeneous, better is the classification. In fact complete homogeneous data means all the data belongs to only one class and hence entropy will be zero.

Entropy of data distribution D is calculated by the following equation :-

Where C is the number of classes and pi is the probability of data belongs to class i.

So, information gain is defined as decrease in entropy, after a dataset is split on any particular attribute.

So, if D is entire data set and A is a particular attribute of this dataset, then information gain is defined when D is splitted based on attribute A as

Where

Where P(c) is the probability that attribute A of data has value c and E(c) is the entropy of data which has value c of attribute A.

So, information gain is measurement of decrease of entropy and hence increase in homogeneity. This information is used to construct decision tree where a branch is created in decision tree based on attribute A if dividing based on attribute A gives highest information gain compared to all other attribute in datased D based on which partition can be done.

Please comment for any clarification.


Related Solutions

1.These learning algorithms are used in classification and prediction and must have data available in which...
1.These learning algorithms are used in classification and prediction and must have data available in which value of the outcome of interest is known. Simple Linear Regression analysis is an example of this. A. Correlation Analysis B. Supervised Learning C. Unsupervised Learning D. Confusion Matrix 2.This partition is used to assess the performance of each model so that you can compare models and pick the best one. A. Training Partition B. Test Partition C. Validation Partition D. None of the...
Explain the concept of a “performance objective” as it relates to strategic planning. Develop three information...
Explain the concept of a “performance objective” as it relates to strategic planning. Develop three information systems-related performance objectives for your department. Elaborate on what these performance objectives will enable your department to eventually achieve. (If you are not employed in a health care organization, refer to an industry in which you work or have worked).
Explain the concept of the 360-degree appraisal. How is this concept used in industry to appraise...
Explain the concept of the 360-degree appraisal. How is this concept used in industry to appraise the employees. Highlight both the advantages and disadvantages of this multi-raters performance evaluation and discuss whether this method is suitable for local employers in Malaysia.
Explain the concept and how it might be used in the stock market - particularly with...
Explain the concept and how it might be used in the stock market - particularly with the events of the few weeks. a. Active Trading b. Selling short OR shorting a stock c. A stop loss order vs. market order d. A limit order e. What is a margin account and what would you use if for? f. what is a margin call? g. Options - a Call vs. Put
Explain the concept and how it might be used in the stock market - particularly with...
Explain the concept and how it might be used in the stock market - particularly with the events of the few weeks. a. Active Trading b. Selling short OR shorting a stock c. A stop loss order vs. market order d. A limit order e. What is a margin account and what would you use if for? f. what is a margin call? g. Options - a Call vs. Put
(d)Consider classification and censorship. What is the purpose of classification of media? Explain briefly how classification...
(d)Consider classification and censorship. What is the purpose of classification of media? Explain briefly how classification systems could be considered censorship.
Explain and describe the concept of homogeneous matrix (projection) and how it is used in computer...
Explain and describe the concept of homogeneous matrix (projection) and how it is used in computer graphics
Regression Trees​ Explain how classification trees works. Given a classification tree, state the classification rule for...
Regression Trees​ Explain how classification trees works. Given a classification tree, state the classification rule for a particular leaf. List the two measures of impurities that were covered in class. Why do we prune trees? What are the advantages of single classification trees? The weaknesses?
The concept of economic “goods” is used to explain how different societal needs are met. Explain...
The concept of economic “goods” is used to explain how different societal needs are met. Explain the concept and give examples that relate to infrastructure.
Explain the concept of duration and describe how it is used in hedging interest rate futures....
Explain the concept of duration and describe how it is used in hedging interest rate futures. Be sure to discuss the limitations of duration.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT