In: Computer Science
Explain the concept of information gain in the Classification Algorithms. How is it used to develop a decision tree?
Before describing information gain, please understand the concept of entropy. Entropy is the measurement of how much data is homogeneous and how much heterogeneous. Lower the value of entropy, more the data distribution is homogeneous. And more the data is homogeneous, better is the classification. In fact complete homogeneous data means all the data belongs to only one class and hence entropy will be zero.
Entropy of data distribution D is calculated by the following equation :-
Where C is the number of classes and pi is the probability of data belongs to class i.
So, information gain is defined as decrease in entropy, after a dataset is split on any particular attribute.
So, if D is entire data set and A is a particular attribute of this dataset, then information gain is defined when D is splitted based on attribute A as
Where
Where P(c) is the probability that attribute A of data has value c and E(c) is the entropy of data which has value c of attribute A.
So, information gain is measurement of decrease of entropy and hence increase in homogeneity. This information is used to construct decision tree where a branch is created in decision tree based on attribute A if dividing based on attribute A gives highest information gain compared to all other attribute in datased D based on which partition can be done.
Please comment for any clarification.