In: Statistics and Probability
1. A sociologist was hired by a large city hospital to investigate the relationship between the number of unauthorized days that employees are absent per year and the distance (miles) between home and work for the employees. A sample of 10 employees was chosen, and the following data were collected. Distance to Work Days Absent 1 8 3 5 4 8 6 7 8 6 10 3 24 5 14 2 14 4 18 2 a. Which variable is the dependent/response variable? b. Which is the independent/explanatory variable? Enter the data into your Excel file c. Develop a scatterplot for these data. Interpret the scatterplot. Run a regression analysis of the data on Excel using Data/Data Analysis d. What is the correlation between distance to work and days absent? What does it say about the strength of the relationship? e. What is the coefficient of determination? How would you interpret it? f. Write the hypotheses for the test of the slope. g. What do you conclude about the slope of the line? What are you basing your conclusion on? h. What is the regression formula that represents the relationship between Distance and Days Absent? i. Use your formula to predict the number of days absent for an employee who lives 15 miles from work. j. Use your formula to predict the number of days absent for an employee who lives 11 miles from work
a. Independent variable=Distance between home and work
Dependent variable= Absent per year
b.
Distance between home and work/Independent variable | Absent per year/dependent variable |
1 | 8 |
3 | 5 |
4 | 8 |
6 | 7 |
8 | 6 |
10 | 3 |
24 | 5 |
14 | 2 |
14 | 4 |
18 | 2 |
c.
From the scatter plot we see that independent and dependent variables are quadratically related.
Since R2=66.20% (approx.), so the regression equation Y=9.1255-0.7386x+0.0225x2 (where y=dependent ariable and x=independent variable) explains 66.20% of total variation. So the fitting is good.
d.
The correlation coefficient=-0.6409 i.e. if distance is increases then the absence is decreasing.
Here we want to mention that Correlation coefficent exists implies that x and y are linearly related. If we fit the linear equation then the detail analysis is given below:
e.
Coefficient of determination=r2= 0.4108 (approx.)
Hence the regression equation of y on x explains 41.08% of total variation of data.
g. Since p-value=0.0458<0.05 we conclude that slope is significantly present.
h. Days absent=7.0289-0.1989Distance
i. Days absent=7.0289-0.1989x15=4.0454=4 days when distance=15 miles
j. Days absent=7.0289-0.1989x11=4.841=5 days when distance=11 miles