In: Statistics and Probability
A large data set on Toledo workers was collected and the first three workers are characterized by:
Worker |
Age |
Hourly Wage |
Female |
Union |
High School |
1 |
33 |
$20 |
1 |
0 |
0 |
2 |
30 |
$24 |
0 |
1 |
1 |
3 |
36 |
$16 |
0 |
0 |
0 |
For the entire data set the average age is 31, the standard deviation of the age is 5, the average hourly wage is $15, and the standard deviation of the hourly wage is $4.
solution:
1) standardized Euclidean distance for Workers 1 and 3 based on Age and Hourly Wage
standardised value = (X - average age) / standard deviation
standardised value for worker 1 based on age= (33-31) / 5 = 0.4
standardised value for worker 3 based on age = (36 - 31) / 5 = 1
standardised value for worker 1 based on hourly wagen = (20-15) / 4 = 1.25
standardised value for worker 3 based on hourly wage = (16-15)/4 = 0.25
Euclidean distance =
ans b) 1.166
2) matching coefficient for Workers 1 and 3 based on Female, Union, and High School
are 2 i,e, (there are two matching pairs of zero) in total of three
so MATCHING COEFFICIENT =(number of variables with matching for worker 1 and 3) / (total number of variables for worker 1 and 3)
MATCHING COEFFICIENT = 2/3
ans c) 2/3
3) Jaccard’s coefficient for Workers 1 and 3
= (no. of non zero matching pair) / ( total variable - no. of non zero matching pair)
no. of non zero matching pair = 0
so, Jaccard’s coefficient for Workers 1 and 3 = 0 / (3 - 2) = 0
ans e) 0