In: Computer Science
Mention the model fit indices for determining the predictive accuracy of classification methods.
While designing a model, it is obvious that it can pass through uncertain observations. There are some common questions that may come across in mind like: Is the classification model accurate? Does it works really good? All such questions can be answered by using some best model fit indices for evaluating the predictive accuracy of Classification Methods:
1. Jaccard Index: In this method, we use two group values: performance values and actual values. Both the group values are used to calculate the Jaccard Index as:
j(x,x^) = || / ||
= || / |x| + |x^| - ||
Here x^ is the group of predicted values for the classification model and x is the group of actual values for the classification model.
2. Log Loss: This metric index method is used when we acquire the model result as a class probability not a class only. It is a measuring parameter for the class probability between 0 and 1. We can use a Log Loss Equation for each row in the given data to measure the performance of the Classification model as:
L = (x * log(x^) + (1 - x) + log(1- x^))
In this equation, the comparison between actual and predicted values occur and thus we can measure the difference between the both values. The smaller the log loss value, the good the model is to be.
3. Confusion Matrix: The Confusion Matrix involves the evaluation of true values with the test data values in order to measure the performance of the classification model. It is used to store four kinds of test values in a matrix:
(i) True Positive: A model interprets Positive case as positive accurately as occurs in case of medical illness when diagnosed.
(ii) True Negative: A model evaluates the Negative case as positive accurately as in medical report with non-existence of the disease is truly non-existent .
(iii) False Positive: In this the negative cases are evaluated as positive inaccurately such as a disease is examined as existing but truly it is not.
(iv) False Negative: In this the positive cases are identified as negative inaccurately such as a disease is identified as not present but it exists.
A confusion matrix for performance measurement is given below:
TP | FP |
FN | TN |
Here we have provided actual values in two columns and predicted values in two rows with positive value as 1 and negative value to be 0.
4. F1- Score: In this metric technique, we will use the above Confusion matrix to find out precision and recall scores. The Precision Score is used to predict the class for accuracy. It should hold as high value as been possible for high accuracy. The Recall Score is often known as Sensitivity determines the true positive rate for each class. It is calculated as:
Precision = True Positive/ True Positive + False Positive
Recall Score = True Positive/True Positive + False Negative
F1- Score = 2 * (Precision * Recall Score)/(Precision + Recall Score)
The good value for F-1 Score is 1 and it is worst case if it equals to 0.