In: Statistics and Probability
4. We would prefer to estimate the number of books in a college library without counting them. Data are collected from colleges across Books (in millions)
| Books (in millions) | Students Enrollment | Highest Degree | Area | 
| 4 | 5 | 3 | 20 | 
| 5 | 8 | 3 | 40 | 
| 10 | 40 | 3 | 100 | 
| 1 | 4 | 2 | 50 | 
| 0.5 | 2 | 1 | 300 | 
| 2 | 8 | 1 | 400 | 
| 7 | 30 | 3 | 40 | 
| 4 | 20 | 2 | 200 | 
| 1 | 10 | 2 | 5 | 
| 1 | 12 | 1 | 100 | 
Using Stepwise regression, show how each of the three factors affects the number of volumes in a college library.
The first regression output is:
| R² | 0.966 | |||||
| Adjusted R² | 0.950 | |||||
| R | 0.983 | |||||
| Std. Error | 0.698 | |||||
| n | 10 | |||||
| k | 3 | |||||
| Dep. Var. | Books (in millions) | |||||
| ANOVA table | ||||||
| Source | SS | df | MS | F | p-value | |
| Regression | 84.3014 | 3 | 28.1005 | 57.67 | .0001 | |
| Residual | 2.9236 | 6 | 0.4873 | |||
| Total | 87.2250 | 9 | ||||
| Regression output | confidence interval | |||||
| variables | coefficients | std. error | t (df=6) | p-value | 95% lower | 95% upper | 
| Intercept | -4.9508 | |||||
| Students Enrollment | 0.1420 | 0.0221 | 6.416 | .0007 | 0.0878 | 0.1962 | 
| Highest Degree | 2.6257 | 0.4286 | 6.126 | .0009 | 1.5770 | 3.6744 | 
| Area | 0.0081 | 0.0025 | 3.209 | .0184 | 0.0019 | 0.0142 | 
Since all the variables are significant, we can say that all of the three factors affects the number of volumes in a college library.
We can say all the variables are significant because all the p-values for the independent variables are less than 0.05.