In: Statistics and Probability
If a classifier performs well on training data but poorly in production, what's the most likely problem?
1. High variance
2. High bias
3. High entropy
4. High measurement noise
According to my knowledge ,The most likely reason for classifier which performs well on training data ,fails to perform in production is High Variance which causes over-fitting.
Variance in context of ML is a type of error that occurs due to model's sensitivity to small fluctuations in the training data
When the model has high variance ,a slight change(fluctuation) in the data results in decrease of accuracy of the model i.e when the model is applied to the new data set other than training , it's accuracy gets reduced drastically and hence the model is no more useful for us.
There are lots of method to overcome the problem of over-fitting ,one of the most popular method is K-fold Cross-validation is a powerful preventative measure against over-fitting. In this method , we split our data into k folds. Then, we iteratively train the algorithm on k-1 folds while using the remaining fold as the test set.
The quality of the trained model is tested by first training the model by using just the training data, and then compare that model with the model that is trained with the test data. In this manner, we can identify which data points bring about a better prediction.