In: Statistics and Probability
A study considered a sample of 50 observations used to predict SALES. Included in the analysis were 9 predictors variables, ( Independent Variables).
X 1 |
X 2 |
X 3 |
X 4 |
X 5 |
X 6 |
X 7 |
X 8 |
X 9 |
|
X 2 |
0.804 |
||||||||
X 3 |
0.625 |
0.443 |
|||||||
X 4 |
0.032 |
0.032 |
0.231 |
||||||
X 5 |
0.159 |
0.214 |
0.177 |
-0.194 |
|||||
X 6 |
0.319 |
0.373 |
0.308 |
0.054 |
0.293 |
||||
X 7 |
-0.016 |
0.030 |
0.079 |
0.168 |
-0.309 |
0.067 |
|||
X 8 |
-0.026 |
0.103 |
0.015 |
0.151 |
-0.311 |
0.059 |
0.912 |
||
X 9 |
0.169 |
-0.027 |
-0.104 |
0.017 |
-0.248 |
0.114 |
0.174 |
0.223 |
|
SALES |
0.764 |
0.630 |
0.756 |
0.149 |
0.171 |
0.426 |
0.145 |
0.141 |
-0.068 |
Source |
DF |
Adj SS |
Adj MS |
F-Value |
|
Regression |
7 |
145157 |
20736.7 |
19.62 |
|
Error |
42 |
44385 |
1056.8 |
||
Total |
49 |
189543 |
Based on the correlation matrix shown below, is there any concern about Multicollinearity?
Yes , there any concern about Multicollinearity
For Pair ( X7 , X8 ) the correlation is 0.912 , which implies they are highly correlated with each other.
Also Pair ( X1 ,X2 ) has correlation = 0.804 , so these pair is also highly correlated
So this pairs of variables involved : Pair ( X1, X2) and Pair ( X7, X8 )
-What limit did you use?
Limit we have used is " > | 0.70 | " for any pair
- What action must be taken to eliminate multicollinearity? Be very specific.
The only think can be done is to remove any of one variables of each pair which have lowest relationship with SALES .
Like in pair ( X7 , X8 ) X8 has lowest relationship with SALES ( which is 0.141 ) , so it is better to remove it .
-Which variable appears to have the weakest relationship with
SALES?
X9 variable have the weakest relationship with SALES ( which is
-0.068 )
-The following is the initial Regression Printout. Variable X2, and X8 were not included in the regression. Why?
Source |
DF |
Adj SS |
Adj MS |
F-Value |
|
Regression |
7 |
145157 |
20736.7 |
19.62 |
|
Error |
42 |
44385 |
1056.8 |
||
Total |
49 |
189543 |
Use the F test and a 0.05 level of significance to determine whether the Regression model is significant.
To test
H0 : bi = 0 , i = 1,3,4,5,6,7,9 { Model is not significant }
H1 : bi 0 for atleast one i { Model is significant }
Test Statistics F:
F = MSR / MSRES = 20736.7 / 1056.8 = 19.62216
Thus calculated F- value is 19.62
What is the value of the F Critical Point
It is given by
is F-distributed with df1= 7 and df2=42 degree of freedom and =0.05,
It can be computed from statistical book or more accurately from any software like R,Excel
From R
> qf(1-0.05,df1=7,df2=42)
[1] 2.23707
Thus value of the F Critical Point is 2.23707
We reject null hypothesis if calculated F-value is less than
How many degrees of freedom did you use?
is F-distributed with df1= 7 and df2=42 degree of freedom and =0.05,
Conclusion -
Since F- value = 19.62 > 2.23707 i.e F- value >
So we reject null hypothesis at 5% of level of significance at hence conclude that model is significant.
Compute the R-Square:
Formula :
R-Square = 1 - SSRES / TSS
= 1 - 44385 / 189543
= 0.7658315
Thus R-Square = 0.7658315
Show your work and compute the Adjusted R-Square:
Formula :
Adjusted R-Square: = 1 - SSRES/ df(Error) / TSS / df(Total)
or = 1 - SSRES/ (n-k ) / TSS / (n-1)
Here k = 7 ( number of regressor )
also n-k = 42 ( given)
= 1 - ( 44385 / 42 ) / ( 189543 / 49 )
= 0.7268034
Thus Adjusted R-Square = 0.7268034