In: Statistics and Probability
Case |
Y |
X1 |
X2 |
X3 |
X4 |
X5 |
X6 |
1 |
43 |
45 |
92 |
61 |
39 |
30 |
51 |
2 |
63 |
47 |
73 |
63 |
54 |
51 |
64 |
3 |
71 |
48 |
86 |
76 |
69 |
68 |
70 |
4 |
61 |
35 |
84 |
54 |
47 |
45 |
63 |
5 |
81 |
47 |
83 |
71 |
66 |
56 |
78 |
6 |
43 |
34 |
49 |
54 |
44 |
49 |
55 |
7 |
58 |
35 |
68 |
66 |
56 |
42 |
67 |
8 |
74 |
41 |
66 |
70 |
53 |
50 |
75 |
9 |
75 |
31 |
83 |
71 |
65 |
72 |
82 |
10 |
70 |
41 |
80 |
62 |
45 |
45 |
61 |
11 |
67 |
34 |
67 |
58 |
56 |
53 |
53 |
12 |
70 |
41 |
74 |
59 |
37 |
47 |
60 |
13 |
72 |
25 |
63 |
55 |
40 |
57 |
62 |
14 |
71 |
35 |
77 |
59 |
43 |
83 |
83 |
15 |
80 |
46 |
77 |
79 |
70 |
54 |
77 |
16 |
84 |
36 |
54 |
60 |
70 |
50 |
90 |
17 |
77 |
63 |
79 |
79 |
67 |
64 |
85 |
18 |
68 |
60 |
80 |
55 |
73 |
65 |
60 |
19 |
68 |
46 |
85 |
75 |
55 |
46 |
70 |
20 |
53 |
52 |
78 |
64 |
52 |
68 |
58 |
Consider the following data:
1. What is the regression equation? (Perform a Multiple Regression Analysis and Paste the table in the first answer box.)
2. State the hypotheses to test for the significance of the independent factors.
3. Which independent factors are significant at alpha= 0.05? Explain.
4. State the hypotheses to test for the significance of the regression equation. Is the regression equation significant at alpha=0.05? Explain.
5. How much of the variability in Y is explained by your model? Explain.
6. What tools would you use to check if the model has multicollinearity problems?
7. Does this model have multicollinearity problems? Explain.
8. If you were to propose a simplified model, eliminating some variables, what would it be? Why?
9. What tools would you use to check if the model assumptions are met?
10. Does this model meet the assumptions? Explain.
Regression Equation
y = 12.9 - 0.154 x1 + 0.074 x2 - 0.035 x3 + 0.265 x4 + 0.010 x5 + 0.625 x6
To test the significance of regression the hypothesis are
H0: βi=0 V/S H1: βi≠0
Source DF Adj SS Adj MS F-Value P-Value
Regression 6 1496.69 249.448 3.51 0.027
x1 1 22.36 22.362 0.31 0.584
x2 1 8.10 8.100 0.11 0.741
x3 1 0.70 0.703 0.01 0.922
x4 1 85.31 85.312 1.20 0.293
x5 1 0.21 0.207 0.00 0.958
x6 1 414.73 414.729 5.83 0.031
Error 13 924.26 71.097
Total 19 2420.95
From p value here p value=0.031<0.05 Then reject H0, i.e. independent factor x6 is significant.
Model Summary
S R-sq R-sq(adj) R-sq(pred)
8.43190 61.82% 44.20% 2.71%
Here R-sq is 61.82% i.e 61% variation in y explained by the
model.
Coefficients
For checking multicollinearity problem we use correlation tool if the correlation is more than 60% in the variables then we say that there is presence of multicollinearity.
Now check Multicollinearity
Cor(data)
Correlation table is below....
Here is presence of multicollinearity.
Here we use Backward Elimination method then
Regression Equation
y = 16.0 + 0.754 x6
Backward Elimination of Terms
Source DF Adj SS Adj MS F-Value P-Value
Regression 1 1404 1404.42 24.87 0.000
x6 1 1404 1404.42 24.87 0.000
Error 18 1017 56.47
Total 19 2421
Model Summary
S R-sq R-sq(adj) R-sq(pred)
7.51492 58.01% 55.68% 46.88%
Conclusion: using ellimation here only one variable are present in
the model and model explains 58% variation.
Assumption
For checking assumption we use normality tool.
Normality and errors are IID N(0,σ2)
Here Data follows normality, and errors are homoscedastic
Multicollinearity correlation table
Y | X1 | X2 | X3 | X4 | X5 | X6 | |
Y | 1 | ||||||
X1 | 0.061 | 1 | |||||
X2 | 0.014 | 0.4518 | 1 | ||||
X3 | 0.4411 | 0.4436 | 0.3917 | 1 | |||
X4 | 0.555 | 0.47 | 0.098 | 0.55 | 1 | ||
X5 | 0.3756 | 0.1371 | 0.085 | 0.1714 | 0.3487 | 1 | |
X6 | 0.7665 | 0.064 | -0.02564 | 0.5439 | 0.5485 | 0.4530 | 1 |
Here correlation is above 60% there is presence of multicollinerity .