In: Statistics and Probability
What is multicollinearity in regression analysis?
Why do we check for this issue?
How can we detect multicollinearity?
When we suspect multicollinearity, what should we do about it?
A basic assumption in multiple linear regression is that rank of matrix of observations on explanatory variable is same as number of explanatory variables , but in many situation in practice the explanatory variable may not remain independent due to various reasons so the situation where explanatory variables are highly intercorrelated is called multicollinearity.
We check for this because if we don't check then variances of regression coefficient becomes inflated and this results to a loss of many important variables and this results to a very inconsistent model adding or deleting any variable can cause a big change in model .
Multicollinearity can be detected by various method like checking correlation matrix of explanatory variable if the elements other than main diagonal are close to one then the situation termed as correlation or by checking VIFj =sigm^2/(1-Rj^2) where Rj ^2 is the square of multiple correlation coefficient obtained by regressing jth explanatory variable on remaining explanatory variables .
When we know that their is some basic equation which are used and like income = saving + domestic , so if we are using these type of variables as explanatory variable then it results into multicollinearity or if we have mistakenly defined model then it can also results into multicollinearity e.g if we had used same variable as a linear form n also used it as square or cross product then we can suspect multicollinearity
The best way to deal with multicollinearity is to obtain more data , using ridge regression ,using some prior information etc