In: Math
What are the major differences among the three methods for increasing the accuracy of a classifier:
(a) bagging,
(b) boosting, and
(c) ensemble?
Bagging otherwise known as bootstrap aggregating, is used to reduce variance which helps avoid overfitting. The idea is that once you have your sample from bootstrapping, you can then build a series of models. This ensemble of models, will carry votes with equal weight and you’re able to use that average.
Boosting refers to any Ensemble method that can combine several weak learners into a strong learner and is used to reduce bias and variance. It does this through a weighted majority vote (classification) or a weighted sum (regression). Ada boost and Gradient boost are two popular methods.
Ensemble relies on the idea that we can lower variance by using the wisdom of the crowd, for example averaging the score across 10 or 20 predictors, as opposed to relying on only one. This has shown to improve accuracy and so its recommended that you explore ensemble techniques. One of the most popular ones is Random Forests, which rely on a specified amount of Decision Trees to create predictions.