In: Computer Science
In a couple sentences, explain the effect of each of
the following concepts on overfitting and underfitting.
1. Hypothesis class complexity
2. Regularization
3. C in SVMs
1.
A statistical hypothesis is an explanation about the relationship between data populations that is interpreted probabilistically. A machine learning hypothesis is a candidate model that approximates a target function for mapping inputs to outputs .
The choice of algorithm (e.g. neural network) and the configuration of the algorithm (e.g. network topology and hyperparameters) define the space of possible hypothesis that the model may represent.
Learning for a machine learning algorithm involves navigating
the chosen space of hypothesis toward the best or a good enough
hypothesis that best approximates the target function.
A common notation is used where lowercase-h (h) represents a given
specific hypothesis and uppercase-h (H) represents the hypothesis
space that is being searched.
h (hypothesis): A single hypothesis, e.g. an instance or specific
candidate model that maps inputs to outputs and can be evaluated
and used to make predictions.
H (hypothesis set): A space of possible hypotheses for mapping
inputs to outputs that can be searched, often constrained by the
choice of the framing of the problem, the choice of model and the
choice of model configuration.
The choice of algorithm and algorithm configuration involves
choosing a hypothesis space that is believed to contain a
hypothesis that is a good or best approximation for the target
function. This is very challenging, and it is often more efficient
to spot-check a range of different hypothesis spaces.
While reducing your hypothesis space typically reduces overfitting, it is not true in general. The best (and really only) way to reduce overfitting without doing any data hacks is to reduce the capacity of your hypothesis space to overfit. If by reducing your hypothesis space you coincidentally decrease its capacity to overfit. . If not, then nothing has changed despite your hypothesis space being smaller.
2. The main reason of overfitting is making a model more complex than necessary. If we find a way to reduce the complexity, then overfitting issue is solved. Regularization penalizes complex models. Regularization adds penalty for higher terms in the model and thus controls the model complexity. If a regularization terms is added, the model tries to minimize both loss and complexity of model.
Without regularization: Minimize (Loss, (model) )
With regularization: Minimize (Loss (model)
+Complexity(model)))
Regularization reduces the variance but does not cause a remarkable increase in the bias. Two common methods of regularization are L1 and L2 regularization. Specifically, underfitting occurs if the model or algorithm shows low variance but high bias. Underfitting is often a result of an excessively simple model. Both overfitting and underfitting lead to poor predictions on new data sets.
3.
SVM’s or Support Vector Machines are supervised learning models used for classification and regression. When using an SVM our preliminary objective is to arrive at a hyperplane that helps us
• Maximize the margin
• Classify the training points accurately
One of the key regularization parameters helping us control or influence the outcome of the model is C also known as penalty parameter or cost parameter for misclassification.
Below are a few values of C and its effects while training a model that is of interest to many enthusiasts when learning SVMs.
a. A low value of C means a lower penalty and allows a little freedom, as a result, the training model has some misclassifications and hence creates a larger margin hyperplane also known as Soft Margin. Lower values of C lead to Underfitting resulting in a high bias and low variance.
b. A high value of C means a higher penalty and is focused on achieving better accuracy by avoiding errors or misclassification and hence opts for a hyperplane with a better classification accuracy resulting in a smaller margin hyperplane also known as Hard Margin. Higher values of C lead to Overfitting resulting in a low bias and high variance.
c. As the value of C approaches positive infinity there is no room for error as the penalty for misclassification is enormous and it results in heavy overfitting as the model pushes for the best possible accuracy.
d. As the value of C approaches zero the model is poorly underfitting and there is potential overlap of the data points and it's most likely the model might be unable to find an appropriate hyperplane.