Question

In: Statistics and Probability

A large data set on Toledo workers was collected and the first three workers are characterized...

A large data set on Toledo workers was collected and the first three workers are characterized by:

Worker

Age

Hourly Wage

Female

Union

High School

1

33

$20

1

0

0

2

30

$24

0

1

1

3

36

$16

0

0

0

For the entire data set the average age is 31, the standard deviation of the age is 5, the average hourly wage is $15, and the standard deviation of the hourly wage is $4.

  1. What is the standardized Euclidean distance for Workers 1 and 3 based on Age and Hourly Wage?
    1. 0.371
    2. 1.166
    3. 1.923
    4. 0.942
    5. 1.543

  1. What is the matching coefficient for Workers 1 and 3 based on Female, Union, and High School?
    1. 0
    2. 1
    3. 2/3
    4. 1/2
    5. 1/3
  1. Refer to Exhibit 4-1. What is Jaccard’s coefficient for Workers 1 and 3 based on Female, Union, and High School?
  1. 1/3
  2. 1/2
  3. 2/3
  4. 1
  5. 0     

Solutions

Expert Solution

solution:

1) standardized Euclidean distance for Workers 1 and 3 based on Age and Hourly Wage

standardised value = (X - average age) / standard deviation

standardised value for worker 1 based on age= (33-31) / 5 = 0.4

standardised value for worker 3 based on age = (36 - 31) / 5 = 1

standardised value for worker 1 based on hourly wagen = (20-15) / 4 = 1.25

standardised value for worker 3 based on hourly wage = (16-15)/4 = 0.25

Euclidean distance =

ans b) 1.166

2) matching coefficient for Workers 1 and 3 based on Female, Union, and High School

are 2 i,e, (there are two matching pairs of zero) in total of three

so MATCHING COEFFICIENT =(number of variables with matching for worker 1 and 3) / (total number of variables for worker 1 and 3)

MATCHING COEFFICIENT = 2/3

ans c) 2/3

3) Jaccard’s coefficient for Workers 1 and 3

= (no. of non zero matching pair) / ( total variable - no. of non zero matching pair)

no. of non zero matching pair = 0

so, Jaccard’s coefficient for Workers 1 and 3 = 0 / (3 - 2) = 0

ans e) 0


Related Solutions

A large data set is separated into a training set and a test set. (a) Is...
A large data set is separated into a training set and a test set. (a) Is it necessary to do this randomly? Why or why not? (b) In R how might this separation be done in a reproducible way? (c) The statistician chooses 20% of the data for training and 80% for testing. Comment briefly on this—2 or 3 lines would be plenty.
Workers at a certain soda drink factory collected data on the volumes​ (in ounces) of a...
Workers at a certain soda drink factory collected data on the volumes​ (in ounces) of a simple random sample of 22 cans of the soda drink. Those volumes have a mean of 12.19 oz and a standard deviation of 0.11 ​oz, and they appear to be from a normally distributed population. If the workers want the filling process to work so that almost all cans have volumes between 12.03 oz and 12.59 ​oz, the range rule of thumb can be...
Workers at a certain soda drink factory collected data on the volumes​ (in ounces) of a...
Workers at a certain soda drink factory collected data on the volumes​ (in ounces) of a simple random sample of 15 cans of the soda drink. Those volumes have a mean of 12.19 oz and a standard deviation of 0.14 ​oz, and they appear to be from a normally distributed population. If the workers want the filling process to work so that almost all cans have volumes between 11.95oz and 12.67​oz, the range rule of thumb can be used to...
An economist with the Liquor, Hospitality and Miscellaneous Workers' Union collected data on the weekly salaries...
An economist with the Liquor, Hospitality and Miscellaneous Workers' Union collected data on the weekly salaries of workers in the hospitality industry in Cairns and Townsville. The union believed that the weekly salaries of employees in Cairns were higher and they were mounting a case for the equalisation of salaries between the northern cities. The researcher took samples of size 30 and 37 in Cairns and Townsville, respectively, and found that the average and standard deviation of the weekly salaries...
An economist with the Liquor, Hospitality and Miscellaneous Workers' Union collected data on the weekly salaries...
An economist with the Liquor, Hospitality and Miscellaneous Workers' Union collected data on the weekly salaries of workers in the hospitality industry in Cairns and Townsville. The union believed that the weekly salaries of employees in Cairns were higher and they were mounting a case for the equalisation of salaries between the northern cities. The researcher took samples of size 30 and 37 in Cairns and Townsville, respectively, and found that the average and standard deviation of the weekly salaries...
Provided set of data are concentrations of suspended solids in river water that are collected for...
Provided set of data are concentrations of suspended solids in river water that are collected for environmental characteristics. Total of 36 samples have concentrations of: 84 49 61 40 83 67 45 66 70 69 80 58 68 60 67 72 73 70 57 63 70 78 52 67 53 67 75 61 70 81 76 79 75 76 58 31 a) Construct a stem‐and‐leaf display of the data. b) Calculate first and third quartiles and median from stem‐and‐leaf display...
Suppose that a researcher collected the following set of data on years of education (X) and...
Suppose that a researcher collected the following set of data on years of education (X) and number of children (Y) for a sample of married adults:                  X                   Y                     12                  2                 14                  1                 17                  0                 10                  3                  8                   5                  9                   3                 12                  4                 14                  2                 18                  0                 16                  2     Draw a scatter plot of the data. Write out the regression equation, then calculate and interpret the meaning of...
A researcher at a large company collected data on the beginning salary and current salary of...
A researcher at a large company collected data on the beginning salary and current salary of 48 randomly selected employees. The beginning salaries had a mean of $16,340 with a standard deviation of $5,970. The current salaries had a mean of $32,070 with a standard deviation of $15,300. The least-squares regression equation for predicting current salary (y) from beginning salary (x) is: predicted salary (y) = LaTeX: -2532.7+2.12x− 2532.7 + 2.12 x A. Joseph Keller started working for the company...
A random sample of workers have been surveyed and data collected on how long it takes...
A random sample of workers have been surveyed and data collected on how long it takes them to travel to work. The data are in this file. Compute a 99% confidence interval for the expected time taken to travel to work. Show all your working and state any assumptions you need to make in order for your confidence interval to be valid. Interpret the confidence interval x 46.13229 43.06446 42.52708 42.12789 44.92402 34.43817 41.85524 44.2512 50.86619 34.34349 50.98036
Employment data at a large company reveal that 72​% of the workers are​ married, 52​% are...
Employment data at a large company reveal that 72​% of the workers are​ married, 52​% are college​ graduates, and three fourthsthree fourths of the college grads are married. Complete parts a through c below. ​a) What's the probability that a randomly chosen worker is neither married nor a college​ graduate? nothing ​(Type an integer or a decimal. Do not​ round.) ​b) What's the probability that a randomly chosen worker is married but not a college​ graduate? nothing ​(Type an integer...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT