In: Math
What are the major differences among the three methods for the evaluation of the accuracy of a classifier:
(a) hold-out method,
(b) cross-validation, and
(c) bootstrap?
Hold-out
Hold-out is when you split up your dataset into a ‘train’ and ‘test’ set. The training set is what the model is trained on, and the test set is used to see how well that model performs on unseen data. A common split when using the hold-out method is using 80% of data for training and the remaining 20% of the data for testing.
Cross-validation
Cross-validation or ‘k-fold cross-validation’ is when the dataset is randomly split up into ‘k’ groups. One of the groups is used as the test set and the rest are used as the training set. The model is trained on the training set and scored on the test set. Then the process is repeated until each unique group as been used as the test set.
The bootstrap analogue to cross validation estimates of generalization error is called out-of-bootstrap estimate (because the test cases are those that were left out of the bootstrap resampled training set).