In: Statistics and Probability
A classifier is trained on a cancer dataset, and achieves 96% accuracy on new observations. Why might this not be considered a good classifier? How could it be improved?
This can not be considered as a good classifier because remaining 4% of accuracy is not given by the classifier which is a large amount. Had it been close to 99% or more then we would have considered it a good classifier.
Some methods to enhance a classification accuracy, talking generally, are:
1 - Cross Validation : Separe your train dataset in groups, always separe a group for prediction and change the groups in each execution. Then you will know what data is better to train a more accurate model.
2 - Cross Dataset : The same as cross validation, but using different datasets.
3 - Tuning your model : Its basically change the parameters you're using to train your classification model (IDK which classification algorithm you're using so its hard to help more).
4 - Improve, or use (if you're not using) the normalization process : Discover which techniques (change the geometry, colors etc) will provide a more concise data to you to use on the training.
5 - Understand more the problem you're treating... Try to implement other methods to solve the same problem. Always there's at least more than one way to solve the same problem. You maybe not using the best approach.