In: Statistics and Probability
A story in the Wall Street Journal reveals that “Tall Workers Earn More Money”. Ron, the president of a company, wants to know whether the relationship between salary and height exists in his own company, and obtains the following data from the human resources department.
a. Calculate the correlation between salary and height, and describe the relationship between these two variables.
b. How much of the variability in salary can be accounted for by height?
c. Is there a significant correlation between salary and height?
Employee |
Salary (X) (in $1000s of dollars) |
Height (Y) (in inches) |
Max |
45 |
72 |
Jacob |
38 |
70 |
Jesse |
39 |
74 |
Jennifer |
33 |
60 |
Jeremy |
40 |
63 |
Brian |
36 |
68 |
Barbara |
42 |
67 |
Benny |
35 |
64 |
Rhonda |
47 |
77 |
Kelly |
45 |
69 |
Miriam |
37 |
67 |
Gloria |
34 |
66 |
Shirley |
42 |
70 |
Eden |
46 |
73 |
a) type all data in Excel and
Data -> data analysis-> correlation -> choose both x and y simulatenously. -> ok
We get correlation coefficient as r= 0.7247
Here the correlation is positive and we can see a correlation between salary and height even though it is not strong( because it is not close to one) , fairly a correlation.
Height and salary are correlated that means there exist a good linear relationship between them.
b)Data- data analysis- regression - choose x and y values -ok
We get the regression equation as
x= -10.6223+ .7372y ( because x is dependent variable and y is independent)
And coefficient of determination= R2= .5252
Ie almost 52.52% of variability in salary can be explained by the height variable. [ Because coefficient of correlation accounts for the percentage that the change in or variation in dependent variable is explained by the independent variable] so here almost 60 % variation in salary is explained by the independent variable height .
c) to test the significance of correlation ,
Test H0: rho =0
H1 : rho not equal to zero
Where rho is the population correlation.
Test Statistic is t= r√(n-2) / √(1-r2)
Which follows t with n-2 degrees of freedom.
On substituting values t = 3.6434
t value corresponding to n-2 degrees of freedom = +_ 2.179
P value = 0.003367
So at 5% level we reject null hypothesis. Since p value is less than alpha =0.05
ie rho not equal to zero, means there is a significant correlation.