In: Statistics and Probability
A researcher wants to test if women’s glucose level is
associated with age, number of pregnancies...
- A researcher wants to test if women’s glucose level is
associated with age, number of pregnancies and BMI. The response
variable is the plasma glucose concentration at 2 hours in an oral
glucose (mg/dL) tolerance test, and the explanatory variables
are: Age: years, Pregnant: number
of pregnancies, Obesity: Yes vs. No
The researcher fitted a linear model
and the results are listed below:
|
Source
|
Degrees of Freedom
|
Sum of Squares
|
Mean Squares
|
F value
|
Pr > F
|
|
Model
|
|
60000
|
|
|
< .0001
|
|
Error
|
400
|
120000
|
|
|
|
Parameter
|
Estimate
|
Standard Error
|
t Value
|
Pr >
½t½
|
|
Intercept
|
85.36
|
4.099
|
20.82
|
<.0001
|
|
Age
|
0.726
|
0.111
|
6.55
|
<.0001
|
|
Pregnancy
|
-0.35
|
0.387
|
-0.78
|
0.4330
|
|
Obese Yes vs. No
|
17.38
|
3.133
|
5.55
|
<.0001
|
- Calculate R2 and interpret the R2. Is
R2 a good model fitting statistic for multiple linear
regression? Why or why not?
- Is the overall model significant? Answer this question with the
statistical evidence.
- Interpret the regression coefficients and the significance of
each referring to the statistical evidence.
- What are the linear model assumptions and how to test or
examine if the model assumptions hold?
- Identify outliers, high leverage or influential data point
based on below model diagnostic statistics:
|
|
Studentized Residual
|
Hat value (mean Hat:
0.0075)
|
Cook’s D
|
|
1
|
-0.65
|
0.001
|
0.00011
|
|
2
|
-3.5
|
0.008
|
0.0015
|
|
3
|
2.4
|
0.03
|
0.025
|