In: Math
The goal is to evaluate three classifiers intending to identify gender (male/female) given the height and weight. The evaluation is to be based on the following dataset:
Gender | Height (cm) | Weight (kg) |
Male | 148 | 60 |
Male | 149 | 66 |
Female | 150 | 60 |
Male | 151 | 62 |
Male | 161 | 72 |
The three classifiers to be evaluated
are:
C1: Anyone with height over 150cm is male; all others are
female
C2: Everyone is male
C3: Classify using a 1-nearest neighbor classifier trained on the
following dataset:
Male | 149 | 61 |
Female | 149 | 61 |
Male | 153 | 70 |
Calculate the following metrics for the
classifiers:
-Accuracy
-Error rate
-Precision of identifying males
-Recall of identifying males
-F1-score of identifying males
Complete the following table with your answers.
Accuracy | Error Rate | Precision | Recall | F1-Score | |
C1 | |||||
C2 | |||||
C3 |
For C1 classifier (anyone above height 150cm is male; all others are female), the predictions are as per below table
We will use below matrix and formulas to calculate the metrics of the classifier
Accuracy = (TP+TN)/(TP+FP+FN+TN)
Error Rate = (FP+FN)/(TP+FP+FN+TN)
Precision = TP/(TP+FP)
Recall = TP/(TP+FN)
F1 Score = 2*(Recall * Precision) / (Recall + Precision)
For Classifier C1, TP = 2, FP = 0, FN = 2 and TN = 1. Substituting the above values in the formulas we get,
Accuracy = 0.6
Error Rate = 0.4
Precision = 1
Recall = 0.5
F1 Score = 0.667
For C2 classifier (everyone is male), the predictions are as per below table
For this case TP = 4, FP = 1, FN =0 and TN = 0. Substituting this in the formulas we get the metrics as below.
Accuracy = 0.8
Error Rate = 0.2
Precision = 0.8
Recall = 1
F1 Score = 0.889
For C3 classifier (1 nearest neighbor classifier), we will need to calculate the Euclidean distance to find the nearest neighbor in the training data set. Euclidean distance D between points (x,y) and (a,b) is given the following formula
The predicted gender would be the gender corresponding to nearest neighbor. In case of ties, we will use a-priori precedence which means, we will assign the class which is most frequent (Male) in the training dataset in case of tie. Using this we get the predictions as per below table
For this case TP = 4, FP = 1, FN =0 and TN = 0. Substituting this in the formulas we get the metrics as below.
Accuracy = 0.8
Error Rate = 0.2
Precision = 0.8
Recall = 1
F1 Score = 0.889
Hence the final metrics table is as below