In: Statistics and Probability
Betty's works in a steel mill making steel cables. Suppose cable strength can be altered in two different ways: thickness and the amount of molybdenum (an element used to make steel) it has. She wants to figure out the strongest cable she can make so she test several different cable formulas each many times. Below is a portion of the data she uses to run a regression with Thickness and Molydenum predicting Strength:
Strength (Kilonewton, kN) | Thickness (mm) | Molybdenum (%) |
38.2 | 8 | 1.2 |
95.1 | 13 | 2.2 |
74.8 | 11 | 1.9 |
54.3 | 9 | 1.3 |
286.5 | 22 | 4.1 |
24.4 | 6 | 0.8 |
149.3 | 15 | 3.5 |
If Betty is making a mistake, what mistake is she making?
Multicollinearity |
||
She's not making a mistake |
||
Reverse causation |
||
One of the variables needs a scalar |
Betty is making a mistake and ie Multicollinearity.
I used R software to come to this conclusion.
Assuming no multicollinearity, the model is being estimated using the following codes:
Due to high multiple R-squared value (0.9701) we can doubt the presence of multicollinearity.
Then I computed the correlation matrix using the following codes:
The correlation matrix shows that the pair-wise correlation among thickness and amount of molybdenum is very high (0.9661162).
An efficient way to deal with the problem of multicollinearity is the Farrar-Glauber test (F-G test) for multicollinearity.
We use the 'mctest'package in R that contains multicollinearity tests. The package includes function ‘omcdiag’ and ‘imcdiag’ which will provide the overall and individual diagnostic checking for multicollinearity respectively.
We find that the value of the Farrar Chi-square test statistic
equals 12.1894 and it is highly significant thereby implying the
presence of multicollinearity in the model .
This convinces us to go for the next step of Farrar – Glauber test
(F – test) .
If VIF(Variance Inflation factor) > 4.0 then I generally assume multicollinearity. Here in this case VIF equals 15.0106 which is much greater than 4. So we can conclude the presence of multicollinearity in our model.