In: Math
What is the purpose of cross-validation? Specifically, why wouldn’t we test a hypothesis trained on all of the available examples?
Cross Validation is used to assess the predictive performance of the models and to judge how they perform outside the sample to a new data set also known as test data
The motivation to use cross validation techniques is that when we fit a model, we are fitting it to a training dataset. Without cross validation we only have information on how does our model perform to our in-sample data. Ideally we would like to see how the model performs when we have a new data in terms of accuracy of its predictions. In science, theories are judged by its predictive performance.
Training Data=
The observations in the training set form the experience that the algorithm uses to learn. In supervised learning problems, each observation consists of an observed output variable and one or more observed input variables
The training set contains a known output and the model learns on this data in order to be generalized to other data later on. A model has been processed by using the training set, you test the model by making predictions against the test set.
Because the data in the testing set already contains known values for the attribute that you want to predict, it is easy to determine whether the model's guesses are correct.
The test set is a set of observations used to evaluate the performance of the model using some performance metric.