In: Statistics and Probability
A sociologist was hired by a large city hospital to investigate the
relationship between the number of unauthorized days that employees
are absent per year and the distance (miles) between home and work
for the employees. A sample of 10 employees was chosen, and the
following data were collected.
(a) Develop a scatterplot with Distance to Work (miles) as the
independent variable. Format the scatterplot to your preferences
with the proper labels on the axes. In a textbox describe the
relationship, if any, between the two variables. Add a linear
trendline to the scatterplot, displaying the equation and R-squared
value on the chart. [The scatterplot may be placed on this
worksheet, a new worksheet or it's own sheet, as long as the chart
is a respectable size for grading purposes.]
(b) On a new worksheet, labeled Calculations, create a table to
calculate the estimated regression equation as the example in the
text (Table 4.2) and the example done in class. Terms to calculate
will be X-bar; Y-bar; (X - X-bar); (Y - Y-bar); (X - X-bar)*(Y -
Y-bar); (X - X-bar)2. Calculate the slope coefficient (b1) and
intercept (b0) in Excel using the table. Write the estimated
regression equation.
(c) On a new worksheet, labeled Regression, provide the regression
analysis using the Data Analysis addin tool. (You may wish to
answer the following questions in a textbox.)
i. Write the estimated regression equation from the regression
output.
ii. Interpret the coefficient for the independent variable, i.e. if
X were to change, what is the result to Y.
iii. Interpret the coefficient of determination (R2) for the
model.
iv. Test for the significance of the relationship at the 0.05 level
of significance.
(d) Using the estimated regression equation, calculate the expected
number of days absent for employees living 5 miles from the
company
Distance to Work Number of Days Absent
1 8
3 5
4 8
6 7
8 6
10 3
12 5
14 2
14 4
18 2
(c) On a new worksheet, labeled Regression, provide the regression analysis using the Data Analysis addin tool.
data>data analysis>regression
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.843121469 | |||||
R Square | 0.710853812 | |||||
Adjusted R Square | 0.674710539 | |||||
Standard Error | 1.289414821 | |||||
Observations | 10 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 1 | 32.69928 | 32.69928 | 19.66767 | 0.002183 | |
Residual | 8 | 13.30072 | 1.662591 | |||
Total | 9 | 46 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 8.097826087 | 0.808822 | 10.01187 | 8.41E-06 | 6.232679 | 9.962973 |
Distance to work | -0.344202899 | 0.077614 | -4.43482 | 0.002183 | -0.52318 | -0.16523 |
Regression eqis
No of days absent=8.0978-0.3442*Distance to work
slope=-0.3442
y intercept=8.0978
ii. Interpret the coefficient for the independent variable, i.e. if X were to change, what is the result to Y.
slope =-0.3442
y/x=-0.3442
if xincreases by 1 unit y decreases by 0.3442 units
iii. Interpret the coefficient of determination (R2) for the model.
Rs q=0.7109
=0.7109*100=71.09%
71.09% variation in No of days absent is explained by model.Good model
iv. Test for the significance of the relationship at the 0.05 level of significance.
H0: there is no linear relationship betweenNo of days absent and Distance to work
that is slope=0
H1:there is a negative relationship between No of days absent and Distance to work
slope <0
test statistic t=-4.43
p=0.0022
p<0.05
Reject Null hypothesis
Accept alternative hypothesis
Conclusion:
There is sufficient statistical evidence at 5% level of significance to conclude that there is a negative relationship between No of days absent and Distance to work