In: Statistics and Probability
Why do we use adjusted R2 instead of R2 in variable selection? Why do we not always choose the model with the highest adjusted R2?
The adjusted R-squared compensates for the addition of variables and only increases if the new predictor enhances the model what would be obtained by probability. Conversely, it will decrease when a predictor improves the model less than what is predicted by chance. i.e. the adjusted R square is increase if added variable is significant. But the R square increase when variable is added ( i.e. it increases for both variable is significant and insignificant). Therefor adjusted R square is used inteed of R square in variable selection method.
Adjusted R square, determines the extent of the variance of the dependent variable which can be explained by the independent variable. By looking at the adjusted R^2 value one can judge whether the data in the regression equation is a good fit. Higher the adjusted R^2 better the regression equation as it implies that the independent variable chosen in order to determine the dependent variable is able to explain the variation in the dependent variable. But in highest ajusted R square also includes that independent varibles in model in which researcher is not interested currently. And highest adjusted R square also includes all variables in model then there is no meaning of variable selection( i.e the purpose of variable selection is violated). Therefore we do not always choose the model with highest ajusted R square.