In: Statistics and Probability
Using the data, determine whether the model using (x1, x2, x3, x4) to predict y is sufficient, or should some or all other predictors be considered? Write the full and reduced models, and then perform the test. Show your work and state your conclusion, but you do not need to specify your hypothesis statements.
y 60323 61122 60171 61187 63221 63639 64989 63761 66019 67857 68169 66513 68655 69564 69331 70551
x1 83 88.5 88.2 89.5 96.2 98.1 99 100 101.2 104.6 108.4 110.8 112.6 114.2 115.7 116.9
x2 234289 259426 258054 284599 328975 346999 365385 363112 397469 419180 442769 444546 482704 502601 518173 554894
x3 2356 2325 3682 3351 2099 1932 1870 3578 2904 2822 2936 4681 3813 3931 4806 4007
x4 1590 1456 1616 1650 3099 3594 3547 3350 3048 2857 2798 2637 2552 2514 2572 2827
We use minitab to solve this question-
MTB > Regress;
SUBC> Response 'y';
SUBC> Nodefault;
SUBC> Continuous 'x1' 'x2' 'x3' 'x4';
SUBC> Terms x1 x2 x3 x4;
SUBC> Constant;
SUBC> Unstandardized;
SUBC> Tmethod;
SUBC> Tanova;
SUBC> Tsummary;
SUBC> Tcoefficients;
SUBC> Tequation.
Regression Analysis: y versus x1, x2, x3, x4
Analysis of Variance
Source DF Adj
SS Adj MS F-Value P-Value
Regression 4 182324999
45581250 186.82 0.000
x1 1
72885 72885 0.30
0.596
x2 1 2826281
2826281 11.58 0.006
x3 1 3002457
3002457 12.31 0.005
x4
1 876397 876397
3.59 0.085
Error 11 2683827
243984
Total 15 185008826
Model Summary
S R-sq
R-sq(adj) R-sq(pred)
493.948 98.55%
98.02% 96.91%
Coefficients
Term
Coef SE Coef T-Value P-Value
VIF
Constant 50084 5943
8.43 0.000
x1
56 103 0.55
0.596 75.87
x2 0.0353
0.0104 3.40 0.006 65.20
x3 -0.854
0.243 -3.51 0.005
3.18
x4 -0.550
0.290 -1.90 0.085 2.50
Regression Equation
y = 50084 + 56 x1 + 0.0353 x2 - 0.854 x3 - 0.550 x4
MTB > BReg 'y' 'x1' 'x2' 'x3' 'x4' ;
SUBC> NVars 1 4;
SUBC> Best 2;
SUBC> Constant.
Best Subsets Regression: y versus x1, x2, x3, x4
Response is y
R-Sq R-Sq
Mallows x x x
x
Vars R-Sq (adj) (pred)
Cp S 1 2 3 4
1 96.7 96.5
95.9 12.7 656.62 X
1 94.3 93.9
92.7 31.5 870.61 X
2 98.1 97.8
97.3 4.7 524.70 X
X
2 96.9 96.4
94.7 13.9 669.34 X X
3 98.5 98.1
97.6 3.3 479.30 X X
X
3 98.1 97.6
96.4 6.6 544.69 X X X
4 98.5 98.0
96.9 5.0 493.95 X X X X
___________________________________________________________________________________________________
For full model R square 98.55 and if use best subset procedure then R square(adj) increases as we add regressors in the model (x1,x2,x3,x4) therefore to predict y we use the model using x1,x2,x3,x4.