In: Statistics and Probability
Researchers studied 70 children periodically until age 18 on
weights at ages 2, 9, and 18
(WT2, WT9, WT18 in kg) and BMI in age 18 (BMI1). Data is attached
below.
a) Decide if there is collinearity in the data from scatter
plots.
Use R to generate the ANOVA table and lm summaries for necessary models and answer the following question.
b) What is SS(WT9|WT2), SS(WT18|WT9, WT2)
c) Discuss the influence of collinearity on the
parameters and the standard errors of WT2 in three models: regress
with WT2 only, regress with WT2 and WT9, and regress with all three
predictors.
Please include R code
WT2 | WT9 | WT18 | BMI18 |
13.6 | 32.5 | 56.9 | 22.53536 |
2 | 11.3 | 27.8 | 49.9 |
3 | 17 | 44.4 | 55.3 |
4 | 13.2 | 40.5 | 65.9 |
5 | 13.3 | 29.9 | 62.3 |
6 | 11.3 | 22.8 | 47.4 |
7 | 11.6 | 30 | 57.3 |
8 | 11.6 | 24.3 | 50 |
9 | 12.4 | 29.9 | 58.8 |
10 | 17 | 44.5 | 80.2 |
11 | 12.2 | 31.8 | 59.9 |
12 | 15 | 32.1 | 56.3 |
13 | 14.5 | 39.2 | 67.9 |
14 | 10.2 | 23.7 | 52.9 |
15 | 12.2 | 26 | 58.5 |
16 | 12.8 | 36.3 | 73.2 |
17 | 13.6 | 29.9 | 54.7 |
18 | 10.9 | 22.2 | 44.1 |
19 | 13.1 | 34.4 | 70.5 |
20 | 13.4 | 35.5 | 60.6 |
21 | 11.8 | 33 | 73.2 |
22 | 12.7 | 25.7 | 57.2 |
23 | 11.8 | 29.2 | 56.4 |
24 | 14.1 | 31.7 | 56.6 |
25 | 10.9 | 23.7 | 46.3 |
26 | 11.8 | 35.3 | 63.3 |
27 | 13.6 | 39 | 65.4 |
28 | 12.7 | 30.8 | 60.1 |
29 | 12.3 | 29.3 | 55 |
30 | 11.5 | 28 | 55.7 |
31 | 12.6 | 33 | 71.2 |
32 | 14.1 | 47.4 | 65.5 |
33 | 11.5 | 27.6 | 57.2 |
34 | 12 | 34.2 | 58.2 |
35 | 10.9 | 28.1 | 56 |
36 | 12.7 | 27.5 | 64.5 |
37 | 11.3 | 23.9 | 53 |
38 | 11.8 | 32.2 | 52.4 |
39 | 15.4 | 29.4 | 56.8 |
40 | 10.9 | 22 | 49.2 |
41 | 13.2 | 28.8 | 55.6 |
42 | 14.3 | 38.8 | 77.8 |
43 | 11.1 | 36 | 69.6 |
44 | 13.6 | 31.3 | 56.2 |
45 | 12.9 | 26.9 | 52.5 |
46 | 13.5 | 33.3 | 64.9 |
47 | 16.3 | 36.2 | 59.3 |
48 | 13.6 | 29.5 | 54.2 |
49 | 10.2 | 23.4 | 49.8 |
50 | 12.6 | 33.8 | 62.6 |
51 | 12.9 | 34.5 | 66.6 |
52 | 13.3 | 34.4 | 65.3 |
53 | 13.4 | 38.2 | 65.9 |
54 | 12.7 | 31.7 | 59 |
55 | 12.2 | 26.6 | 47.4 |
56 | 15.4 | 34.2 | 60.4 |
57 | 12.7 | 27.7 | 56.3 |
58 | 13.2 | 28.5 | 61.7 |
59 | 12.4 | 30.5 | 52.4 |
60 | 10.9 | 26.6 | 52.1 |
61 | 13.4 | 39 | 58.4 |
62 | 10.6 | 25 | 52.8 |
63 | 11.8 | 25.6 | 60.4 |
64 | 14.2 | 34.2 | 61 |
65 | 12.7 | 29.8 | 67.4 |
66 | 13.2 | 27.9 | 54.3 |
67 | 11.8 | 27 | 56.3 |
68 | 13.3 | 41.4 | 97.7 |
69 | 13.2 | 41.6 | 68.1 |
70 | 15.9 | 42.4 | 63.1 |
- To answer this question fir we need to know what is collinearity - It is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy.
Example - you went see a rock and roll band with two great guitar players. You want to know which plays better. But, they are both playing amazing leads at the same time! When they are both playing loud and fast, how can you tell which guitarist has the biggest effect on the sound? Even though they aren't playing the same notes, what they're doing is so similar it's difficult to tell one from the other. that's the problem with collinearity.
To check that we check VIF (variance inflation factor)
If the VIF is equal to 1 there is no collinearity among factors, but if the VIF is greater than 1, the predictors may be moderately correlated. A VIF between 5 and 10 indicates high correlation that may be problematic
Based on our data
a) Coefficients
Term | Coef | SE Coef | T-Value | P-Value | VIF |
Constant | 51.87 | 4.69 | 11.05 | 0.000 | |
WT2 | 0.0497 | 0.0414 | 1.20 | 0.235 | 1.01 |
WT9 | -3.125 | 0.429 | -7.28 | 0.000 | 2.09 |
WT18 | 1.456 | 0.181 | 8.03 | 0.000 | 2.07 |
Since the VIF lies in the range of 1.01 to 2.09, we can say that collinearity is there but not to that extent that it may cause problem in data analysis
b) Regression Analysis: WT9 versus WT2
The regression equation is
WT9 = 13.56 - 0.01332 WT2
Analysis of Variance
Source | DF | SS | MS | F | P |
Regression | 1 | 4.942 | 4.94177 | 0.63 | 0.429 |
Error | 68 | 529.944 | 7.79329 | ||
Total | 69 | 534.886 |
As per the above data SS (WT9/WT2) is 4.942
Regression Analysis: WT18 versus WT9
The regression equation is
WT18 = 9.832 + 1.692 WT9
Source | DF | SS | MS | F | P |
Regression | 1 | 1530.94 | 1530.94 | 72.33 | 0.000 |
Error | 68 | 1439.32 | 21.17 | ||
Total | 69 | 2970.27 |
As per the above data SS (WT18/WT9) is 1530.94
Regression Analysis: WT18 versus WT2
The regression equation is
WT18 = 32.33 - 0.01000 WT2
Analysis of Variance
Source | DF | SS | MS | F | P |
Regression | 1 | 2.79 | 2.7882 | 0.06 | 0.801 |
Error | 68 | 2967.48 | 43.6394 | ||
Total | 69 | 2970.27 |
As per the above data SS (WT18/WT2) is 2.79
c) Standard error tells us the average distance that the observed values fall from the regression line, Conveniently, it tells you how wrong the regression model is on average using the units of the response variable. Smaller values are better because it indicates that the observations are closer to the fitted line, we can say that prediction is better for WT18 for WT2 as compared to WT9