In: Computer Science
To evaluate the predictive performance of a model constructed from a two-class data set, k-fold cross-validations are frequently applied. Describe the concept of cross-validation, and two performance measures, sensitivity and specificity, respectively.
Cross Validation:
Cross-validation is a technique in which we train our model using the subset of data-set and then evaluate using complementary subset of the data-set.
We use cross-validation technique when we couldn’t fit the model on the training data and can’t say that the model will work accurately for the real data. For this, we must assure that our model got the correct patterns from the data, and it is not getting up too much noise.
The steps involved in cross-validation are:
1.Reserve some portion of sample data-set.
2.Using the rest data-set train the model.
3.Test the model using the reserve portion of the data-set.
Two performance measures:
The true performance measures are
1. F1Score
2. Recall
1.F1 Score:
Its a harmonic mean between recall and precision. Its range is [0,1].This metric usually tells us how precise and robust our classifier is. It is used to measure the tests accuracy.
2. Classification Accuracy:
It is the ratio of number of correct predictions to the total number of input samples.
It works well only if there are equal number of samples belonging to each class.
Classification accuracy is good but it gives False Positive sense of achieving high accuracy. The problem arises due to the possibility of miss-classification of minor class samples are very high.
Sensitivity:
Sensitivity is the proportion of truely positive cases that were classified as positive; thus, it is a measure of how well your classifier identifies positive cases.
Specificity:
Specificity is the proportion of truly negative cases that were classified as negative; thus, it is a measure of how well your classifier identifies negative cases.