In: Statistics and Probability
A study considered a sample of 50 observations used to predict SALES. Included in the analysis were 9 predictors variables, ( Independent Variables).
Correlations
X 1 |
X 2 |
X 3 |
X 4 |
X 5 |
X 6 |
X 7 |
X 8 |
X 9 |
|
X 2 |
0.804 |
||||||||
X 3 |
0.625 |
0.443 |
|||||||
X 4 |
0.032 |
0.032 |
0.231 |
||||||
X 5 |
0.159 |
0.214 |
0.177 |
-0.194 |
|||||
X 6 |
0.319 |
0.373 |
0.308 |
0.054 |
0.293 |
||||
X 7 |
-0.016 |
0.030 |
0.079 |
0.168 |
-0.309 |
0.067 |
|||
X 8 |
-0.026 |
0.103 |
0.015 |
0.151 |
-0.311 |
0.059 |
0.912 |
||
X 9 |
0.169 |
-0.027 |
-0.104 |
0.017 |
-0.248 |
0.114 |
0.174 |
0.223 |
|
SALES |
0.764 |
0.630 |
0.756 |
0.149 |
0.171 |
0.426 |
0.145 |
0.141 |
-0.068 |
Analysis of Variance
Source |
DF |
Adj SS |
Adj MS |
F-Value |
|
Regression |
7 |
145157 |
20736.7 |
19.62 |
|
Error |
42 |
44385 |
1056.8 |
||
Total |
49 |
189543 |
|||
Term |
Coef |
SE Coef |
T-Value |
Constant |
-15.8 |
22.1 |
-0.71 |
X 1 |
11.93 |
2.34 |
|
X 3 |
5.75 |
1.77 |
3.24 |
X 4 |
0.000023 |
0.000096 |
0.24 |
X 5 |
-0.59 |
2.84 |
|
X 6 |
4.08 |
2.07 |
|
X 7 |
0.0891 |
0.0530 |
1.68 |
X 9 |
-0.0302 |
0.0150 |
-2.02 |
Sales =
For X1 = X5 = X6 =
Use the t test and a 0.05 level of significance to determine the significance of each independent variable.
Based on the correlation matrix shown below, is there any concern about Multicollinearity?
Yes , there any concern about Multicollinearity
This pairs of variables involved : Pair ( X1, X2) and Pair ( X7, X8 )
For Pair ( X7 , X8 ) the correlation is 0.912 , which implies they are highly correlated with each other.
Also Pair ( X1 ,X2 ) has correlation = 0.804 , so these pair is also highly correlated
-What limit did you use?
Limit we have used is " > | 0.70 | " for any pair
- What action must be taken to eliminate multicollinearity? Be very specific.
The only think can be done is to remove any of one variables of each pair which have lowest relationship with SALES .
Like in pair ( X7 , X8 ) X8 has lowest relationship with SALES ( which is 0.141 ) , so it is better to remove it .
-Which variable appears to have the weakest relationship with
SALES?
X9 variable have the weakest relationship with SALES ( which is
-0.068 )
-The following is the initial Regression Printout. Variable X2, and X8 were not included in the regression. Why?
Pair ( X1, X2) and Pair ( X7, X8 ) are highly correlated , among pair ( X1, X2) X2 have the weakest relationship with SALES and among pair ( X7, X8 ) X8 have the weakest relationship with SALES .So remove X2 and X8
Source |
DF |
Adj SS |
Adj MS |
F-Value |
|
Regression |
7 |
145157 |
20736.7 |
19.62 |
|
Error |
42 |
44385 |
1056.8 |
||
Total |
49 |
189543 |
Use the F test and a 0.05 level of significance to determine whether the Regression model is significant.
To test
H0 : bi = 0 , i = 1,3,4,5,6,7,9 { Model is not significant }
H1 : bi 0 for atleast one i { Model is significant }
Test Statistics F:
F = MSR / MSRES = 20736.7 / 1056.8 = 19.62216
Thus calculated F- value is 19.62
What is the value of the F Critical Point
It is given by
is F-distributed with df1= 7 and df2=42 degree of freedom and =0.05,
It can be computed from statistical book or more accurately from any software like R,Excel
From R
> qf(1-0.05,df1=7,df2=42)
[1] 2.23707
Thus value of the F Critical Point is 2.23707
We reject null hypothesis if calculated F-value is less than
How many degrees of freedom did you use?
is F-distributed with df1= 7 and df2=42 degree of freedom and =0.05,
Conclusion -
Since F- value = 19.62 > 2.23707 i.e F- value >
So we reject null hypothesis at 5% of level of significance at hence conclude that model is significant.
-Compute the R-Square:
Formula :
R-Square = 1 - SSRES / TSS
= 1 - 44385 / 189543
= 0.7658315
Thus R-Square = 0.7658315
-Show your work and compute the Adjusted R-Square:
Formula :
Adjusted R-Square: = 1 - SSRES/ df(Error) / TSS / df(Total)
or = 1 - SSRES/ (n-k ) / TSS / (n-1)
Here k = 7 ( number of regressor )
also n-k = 42 ( given)
= 1 - ( 44385 / 42 ) / ( 189543 / 49 )
= 0.7268034
Thus Adjusted R-Square = 0.7268034
Based on the Table above, what is the regression equation?
Sales = -15.8+11.93*X1+5.75*X3+0.000023*X4-0.59*X5+4.08*X6+0.0891*X7-0.0302*X8
Three T-Values are missing. Compute them. The T-values are:
t-VALUE = coef / SE coef
For X1 = 11.93/2.34 = 5.098291
For X5 = -0.59/2.84 =-0.2077465
For X6 = 4.08/2.07 = 1.971014
Term |
Coef |
SE Coef |
T-Value |
Constant |
-15.8 |
22.1 |
-0.71 |
X 1 |
11.93 |
2.34 |
5.098291 |
X 3 |
5.75 |
1.77 |
3.24 |
X 4 |
0.000023 |
0.000096 |
0.24 |
X 5 |
-0.59 |
2.84 |
-0.2077465 |
X 6 |
4.08 |
2.07 |
1.971014 |
X 7 |
0.0891 |
0.0530 |
1.68 |
X 9 |
-0.0302 |
0.0150 |
-2.02 |
-Use the t test and a 0.05 level of significance to determine the significance of each independent variable.
-What is the value of the T Critical Point?
value of the T Critical Point is given by
Here n-(k+1) = 42
is t-distribute with df=42 degree of freedom and =0.05,
It can be computed from statistical book or more accurately from any software like R,Excel
From R
> qt(1-0.05/2,42)
[1] 2.018082
Thus value of the T Critical Point is 2.018082
-How many degrees of freedom did you use?
42 degree of freedom ( defined in above part )
-Would you eliminate any variable from the model?
Here hypothesis to test are
H0 : bj = 0 { j th variable is not significant }
vs
H1 : bi 0 { j th variable contributes significantly to model }
Test Statistics is t-value
where t-value = = coef / SE coef
t-value are obtained for each variables .
We reject null hypothesis if calculated absolute t-value is greater than T Critical Point 2.018082
i.e is | t-value | > 2.018082
We can see variable X1, X3 , X9 have t-value greater than 2.01
Hence we conclude that variable X1 , X3 , X9 contributes significantly to our model .
Other remaining variables can be removed from our model .
-Which variable?
We remove variable X3 , X4 , X5 , X6 and X7