In: Statistics and Probability
            A researcher wants to test if women’s glucose level is
associated with age, number of pregnancies...
                
            
- A researcher wants to test if women’s glucose level is
associated with age, number of pregnancies and BMI. The response
variable is the plasma glucose concentration at 2 hours in an oral
glucose (mg/dL) tolerance test, and the explanatory variables
are:   Age: years, Pregnant: number
of pregnancies, Obesity: Yes vs. No
 
The researcher fitted a linear model
and the results are listed below:
| 
 Source 
 | 
 Degrees of Freedom 
 | 
 Sum of Squares 
 | 
 Mean Squares 
 | 
 F value 
 | 
 Pr > F 
 | 
| 
 Model 
 | 
 | 
 60000 
 | 
 | 
 | 
 < .0001 
 | 
| 
 Error 
 | 
 400 
 | 
 120000 
 | 
 | 
 | 
| 
 Parameter 
 | 
 Estimate 
 | 
 Standard Error 
 | 
 t Value 
 | 
 Pr >
½t½ 
 | 
| 
 Intercept 
 | 
 85.36 
 | 
 4.099 
 | 
 20.82 
 | 
 <.0001 
 | 
| 
 Age 
 | 
 0.726 
 | 
 0.111 
 | 
 6.55 
 | 
 <.0001 
 | 
| 
 Pregnancy 
 | 
 -0.35 
 | 
 0.387 
 | 
 -0.78 
 | 
 0.4330 
 | 
| 
 Obese Yes vs. No 
 | 
 17.38 
 | 
 3.133 
 | 
 5.55 
 | 
 <.0001 
 | 
- Calculate R2 and interpret the R2. Is
R2 a good model fitting statistic for multiple linear
regression? Why or why not?
 
- Is the overall model significant? Answer this question with the
statistical evidence.
 
- Interpret the regression coefficients and the significance of
each referring to the statistical evidence.
 
- What are the linear model assumptions and how to test or
examine if the model assumptions hold?
 
- Identify outliers, high leverage or influential data point
based on below model diagnostic statistics:
 
| 
 | 
 Studentized Residual 
 | 
 Hat value (mean Hat:
0.0075) 
 | 
 Cook’s D 
 | 
| 
 1 
 | 
 -0.65 
 | 
 0.001 
 | 
 0.00011 
 | 
| 
 2 
 | 
 -3.5 
 | 
 0.008 
 | 
 0.0015 
 | 
| 
 3 
 | 
 2.4 
 | 
 0.03 
 | 
 0.025 
 |