Question

In: Computer Science

What kind of problems attributes make in entropy-based decision trees. How can we solve them?

What kind of problems attributes make in entropy-based decision trees. How can we solve them?

Solutions

Expert Solution

Decision Tree Classification:

•A machine researcher named J. Ross Quinlan in 1980 developed a decision tree algorithm known as ID3 (Iterative Dichotomiser).

•Later, he presented C4.5, which was the successor of ID3.

•ID3 and C4.5 adopt a greedy approach.

•In this algorithm, there is no backtracking; the trees are constructed in a top-down recursive divide-and-conquer manner.

Decision Tree Classification – Approach:

•There are couple of algorithms there to build a decision tree:

•CART (Classification and Regression Trees) → uses Gini Index(Classification) as metric.

•ID3 (Iterative Dichotomiser 3) → uses Entropy function and Information gain as metrics.

•We can use either of them to build the classification tree.

Attribute selection :

•An attribute selection measure is a heuristic for selecting the splitting criterion that “best” separates a given data sample, S, of class-labeled training tuples into individual classes.

•If we were to split S into smaller partitions according to the outcomes of the splitting criterion, ideally each partition would be pure (i.e., all the tuples that fall into a given partition would belong to the same class).

•Conceptually, the “best” splitting criterion is the one that most closely results in such a scenario.

•Attribute selection measures are also known as splitting rules because they determine how the tuples at a given node are to be split.

•We shall be using Information Gain, Gain Ratio as the attribute selection measures.

Entropy:

•In machine learning, entropy is a measure of the randomness in the information being processed.

•Also, it is a measure of the uncertainty associated with a random variable.

•The higher the entropy, the harder it is to draw any conclusions from that information.

•Where,

S, the data sample, be a training set of class-labeled tuples.

•The class label attribute has m distinct values defining m distinct classes, Ci (for i = 1, …… , m).

•P(Ci) is the probability that a tuple in S belongs to class Ci

Ci,S be the set of tuples of class Ci in S.

Information Gain:

•Information gain can be defined as the amount of information gained about a random variable or signal from observing another random variable.

•It can be considered as the difference between the entropy of parent node and weighted average entropy of child nodes.

Information Gain = Entropy(Parent) – [Weighted Average] * Entropy(Children)

where |Sj| is Weight of the jth partition

Attribute A can be used to split S into v partitions or subsets, {S1, S2, …. , Sv} where Sj contains those tuples in S that have outcome aj of A.

•The attribute A with the highest information gain, Gain(A), is chosen as the splitting attribute at node N().

Gain Ratio:

•Information gain is biased for the attribute with many outcomes.

•It means it prefers the attribute with a large number of distinct values.

•For instance, consider an attribute with a unique identifier such as customer_ID has zero info(S) because of pure partition.

•This maximizes the information gain and creates useless partitioning.

•The attribute with the highest gain ratio is chosen as the splitting attribute.


Related Solutions

In biotechnology, where do you use differential equations? What kind of problems can we solve with...
In biotechnology, where do you use differential equations? What kind of problems can we solve with them?
How can we use algebraic skills and properties to conduct business and solve business problems? What...
How can we use algebraic skills and properties to conduct business and solve business problems? What is an example of a way we can use algebraic skills to make decisions?
what kind of things can be learned when looking at phylogenetic trees?
what kind of things can be learned when looking at phylogenetic trees?
can you explain optimization problems and the best way to solve them. also if you can...
can you explain optimization problems and the best way to solve them. also if you can explain related rates and the best atrategy. thank you!
Please answer them correctly. Here are 2 short problems. Please solve all 2 problems correctly. Make...
Please answer them correctly. Here are 2 short problems. Please solve all 2 problems correctly. Make sure the answers are correct. I would really appreciate your effort. Thanks. 1). Sunland, Inc., is considering investing in a new production line for eye drops. Other than investing in the equipment, the company needs to increase its cash and cash equivalents by $10,000, increase the level of inventory by $27,000, increase accounts receivable by $25,000, and increase accounts payable by $5,000 at the...
What is science, and how do we use it to solve real-world problems? What is science,...
What is science, and how do we use it to solve real-world problems? What is science, and what is biology in particular?
In your words, what kind of relationship problems can lead to sexual difficulties? Explain how these...
In your words, what kind of relationship problems can lead to sexual difficulties? Explain how these might occur and offer some solutions.
what are some behavioral problems associated with setting budgets and how can we reduce such problems
what are some behavioral problems associated with setting budgets and how can we reduce such problems
For what kind of models can we use OLS?
For what kind of models can we use OLS?
projects with unequal lives cause problems when making decisions decision making? how can we overcome this...
projects with unequal lives cause problems when making decisions decision making? how can we overcome this problem? “A project’s engineering life does not always equal its economic life.” Explain this statement with an example
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT