Question

In: Math

The goal is to evaluate three classifiers intending to identify gender (male/female) given the height and...

The goal is to evaluate three classifiers intending to identify gender (male/female) given the height and weight. The evaluation is to be based on the following dataset:

Gender Height (cm) Weight (kg)
Male 148 60
Male 149 66
Female 150 60
Male 151 62
Male 161 72

The three classifiers to be evaluated are:
C1: Anyone with height over 150cm is male; all others are female
C2: Everyone is male
C3: Classify using a 1-nearest neighbor classifier trained on the following dataset:

Male 149 61
Female 149 61
Male 153 70

Calculate the following metrics for the classifiers:
-Accuracy
-Error rate
-Precision of identifying males
-Recall of identifying males
-F1-score of identifying males

Complete the following table with your answers.

Accuracy Error Rate Precision Recall F1-Score
C1
C2
C3

Solutions

Expert Solution

For C1 classifier (anyone above height 150cm is male; all others are female), the predictions are as per below table

We will use below matrix and formulas to calculate the metrics of the classifier

Accuracy = (TP+TN)/(TP+FP+FN+TN)

Error Rate = (FP+FN)/(TP+FP+FN+TN)

Precision = TP/(TP+FP)

Recall = TP/(TP+FN)

F1 Score = 2*(Recall * Precision) / (Recall + Precision)

For Classifier C1, TP = 2, FP = 0, FN = 2 and TN = 1. Substituting the above values in the formulas we get,

Accuracy = 0.6

Error Rate = 0.4

Precision = 1

Recall = 0.5

F1 Score = 0.667

For C2 classifier (everyone is male), the predictions are as per below table

For this case TP = 4, FP = 1, FN =0 and TN = 0. Substituting this in the formulas we get the metrics as below.

Accuracy = 0.8

Error Rate = 0.2

Precision = 0.8

Recall = 1

F1 Score = 0.889

For C3 classifier (1 nearest neighbor classifier), we will need to calculate the Euclidean distance to find the nearest neighbor in the training data set. Euclidean distance D between points (x,y) and (a,b) is given the following formula

The predicted gender would be the gender corresponding to nearest neighbor. In case of ties, we will use a-priori precedence which means, we will assign the class which is most frequent (Male) in the training dataset in case of tie. Using this we get the predictions as per below table

For this case TP = 4, FP = 1, FN =0 and TN = 0. Substituting this in the formulas we get the metrics as below.

Accuracy = 0.8

Error Rate = 0.2

Precision = 0.8

Recall = 1

F1 Score = 0.889

Hence the final metrics table is as below


Related Solutions

Gender is coded as 0 for Female and 1 for Male. Why and what is this...
Gender is coded as 0 for Female and 1 for Male. Why and what is this called? Perform a multiple regression analysis using Self Esteem (SE) as the dependent variable (Y) and Age, Months Unemployed, Wage at last job, Social Support, and Financial Hardship as independent variables (X).    What is the Multiple Regression Equation?    What is the Coefficient of Multiple Determination? City County State Gender Age Months Unemployed Wage at last job Social support Financial Hardship SE GSE...
Measurements of length (cm) were taken for a sample of fish. The gender (male or female)...
Measurements of length (cm) were taken for a sample of fish. The gender (male or female) of each fish was also recorded. The most appropriate technique to explore whether there is a difference in the mean weights of fish between males and females would be: Select one: a. 1 sample z-test b. testing for two proportions c. 2 sample t-test. d. 1 sample t-test
Take your survey data and code the gender question as Male= 0 and Female = 1....
Take your survey data and code the gender question as Male= 0 and Female = 1. Suppose we hypothesize that women are more likely to respond to surveys than men are. Specifically, we believe that more than 55 percent of all respondents are women which equates to the proportion of the gender variable to be greater than 0.55. Test the hypothesis. Females= 24 Males= 9 Provide the 6 step process of testing the hypothesis. Present your results and discuss why...
5. (a) Construct a 99% confidence interval for the mean height of the entire female/male SCC...
5. (a) Construct a 99% confidence interval for the mean height of the entire female/male SCC student body. (b) What is the width of this interval? (c) Write a sentence interpreting your confidence interval. (d) Review your 93% and 99% confidence intervals above. Which is wider and why? male Student # Gender Height Shoe Age Hand 1 M 67 10 19 R 2 M 74 12 17 R 3 M 72 11.5 19 R 4 M 69 10 35 R...
When a researcher is examining the association between gender (male/female) and high school grade point average,...
When a researcher is examining the association between gender (male/female) and high school grade point average, what test would be appropriate? Pearson correlation coefficient Chi-square Independent t-test Dependent t-test as your answer Dependent t-test
For each of the following, identify what characteristics of a species male and female karyotype would...
For each of the following, identify what characteristics of a species male and female karyotype would indicate that mechanism of sex-determination. In each case, list a characteristic that would be specific to only that sex-determination system if possible. If no such characteristic is possible, specify which other sex-determination system would show the same karyotype characteristic. (A) Genic (B) XY sex chromosomes: (C) ZW sex chromosomes (D) Haplo-diploid genomes (E) XO sex chromosomes (F) Environmental
4. Identify which of the the male and female external genitalia are homologus to each other....
4. Identify which of the the male and female external genitalia are homologus to each other. 5. Describe the descent of the gonads and explain why it is important. 6. Describe the structure of the ovary.
Probability Theory. In a given population of two-earner male-female couples, male earnings have a mean of...
Probability Theory. In a given population of two-earner male-female couples, male earnings have a mean of $50,000 per year and a standard deviation of $15,000. Female earnings have a mean of $48,000 per year and a standard deviation of $13,000. The correlation between male and female earnings for a couple is 0.90. Let C denote the combined earnings for a randomly selected couple. c. What is the standard deviation of C?
Take your survey data and code the gender question as Male=0 and Female=1. Suppose we hypothesize...
Take your survey data and code the gender question as Male=0 and Female=1. Suppose we hypothesize that women are more likely to respond to surveys than men are. Specifically, we believe that more than 55 percent of all respondents are women (this equates to the proportion of the gender variable to be greater than 0.55). Use the methods learned this week to test these hypotheses. Total participants are 50. No of males responded are 24. No of females responded are...
Take your survey data and code the gender question as Male=0 and Female=1. Suppose we hypothesize...
Take your survey data and code the gender question as Male=0 and Female=1. Suppose we hypothesize that women are more likely to respond to surveys than men are. Specifically, we believe that more than 55 percent of all respondents are women (this equates to the proportion of the gender variable to be greater than 0.55). Use the methods learned this week to test this hypothesis. Females= 27 Males= 8 Make sure you go through the 5-step process of testing the...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT