In: Biology
Regression Trees
Explain how classification trees works.
Given a classification tree, state the classification rule for a particular leaf.
List the two measures of impurities that were covered in class.
Why do we prune trees?
What are the advantages of single classification trees? The weaknesses?
1)classification tree method also called as DECISION TREE method.Tree method is applied when the given data is complex and in turn contains multiple classifications,different outputs for different inputs in that given classification and contains deep data mining material.The ultimate goal is to generating easily graspable(understandable) rules that can be easily explained and translatable. it contains-
Root-whole set is based on the root.foundation part for our understanding if we take (TRAINING OF PARTICULAR DATA SET )as example,We will keep best attribute of data set as ROOT.
Root node-top most decision node in a tree.
subset/internal nodes/decision nodes-dividing the data set again which will give different output with in confined boundary.
leafnodes-the final come with predicted value having leaf lets.
It is a test design used in cybernatics field usually and also in reaserch fields in public health by altering some rules for(trohoc/prospective study).
2)The ultimate decision of any branch in classification tree is ended with leaf.
it represents the final decision.the leaf is decided by root node at initial step.it depended on the demand that produced to bifurcate in the given stem to yield different outputs.
3)Gini impurity measures how frequently a randomly choosen part/element from the given set would be incorrectly tagged(labelled) if it was randomly labelled according to distribution of labels in subset.measured by adding the probability of the item with given label(x)
measuring for set of items with n number of classes with x=(1,2,3.........x) and p be the fraction of items tagged with class x in the data set.
4)we prune trees to reduce the quantity of complexed decision tree
involves removes section of tree(which will show influence to classify instances)
reduces net nature /complex nature and transforms in to simple form.
enhancing guessing accuracy by degrading/decreasing load.
5)advantages-easy to understand
requires less data warm up/preparation.
easily modifiable if needed
easy to interprate.
small change will not affect more leaflets as observed in complexed classification tress.
no overfitting.
disadvantages-not works with large data sets,
only handle small projects,
cannot handle huge numerical data and categorical data at once.