In: Math
Describe how and why you should partition your data when using classification techniques like k-nearest neighbors and logistic regression.
Partitioning is now a days generally referred as Test Train set splitting process of the data. For logistic / knn we can split the data as follows.
A brief description of the process is given below
You could imagine slicing the single data set as follows:
Slicing a single data set into a training set and test set.
Make sure that your test set meets the following two conditions:
Assuming that your test set meets the preceding two conditions, your goal is to create a model that generalizes well to new data. Our test set serves as a proxy for new data. For example, consider the following figure. Notice that the model learned for the training data is very simple. This model doesn't do a perfect job—a few predictions are wrong. However, this model does about as well on the test data as it does on the training data. In other words, this simple model does not over-fit the training data.
Hope the above answer has helped you in understanding the problem. Please upvote the ans if it has really helped you. Good Luck!!