In: Computer Science
Machine Learning - multivariate methods
Let us say in two dimensions, we have two classes with exactly the same mean. What type of boundaries can be defined? show a picture of the options
Is least square error a suitable choice for classification?
No, it is not. One of the issues with the least square solution is that it lacks robustness to outliers. Consider the figure given below, we can see that on left-hand side we have two classes denoted by red and blue, separated by the decision boundary. The green decision boundary corresponds to the solution obtained by logistic regression and magenta decision boundary corresponds to the least square solution. Now on the right-hand side, we see that after adding more data points to the training set, the magenta curve shifts, thus misclassifying few data points but the green curve is unmoved. This shows that least square solution even penalizes the predictions that are too far from the correct side of the decision boundary .
The least square solution fails to give correct solution not
only in case of data consisting of outliers but in also in several
other cases. This happens because the least square solution is
equivalent to the maximum likelihood solution under the assumption
of Gaussian distribution (as we will see in the further sections).
Therefore in case of the datasets that fails to show such
distribution, least square will not be a good choice.
There are several alternative error functions for
classification.