In: Statistics and Probability
Why is it considered risky and unreliable to use the regression formula to predict the dependent variable outside the range of the actual values of the independent variable is...
In real life, using values outside of the range of independent variable may be costly. In regression, we fir the model on past data, and test the model using the present values of independent variable and apply on future data.
While using present values of independent variable outside of the range, it may happen that the prediction of depedent variable may not be correct and the model may not be adequate for these values. Moreover, the values outside of tha range will be outliers, and due to outliers our model would not be fitted good and hence will predict wrong output in the future.
Hence the main reasons are for not using values outside of the range are,
1. Due to Outliers, regression model will not be adequate, the assumptions may get violated and hence, prediction will be wrong.
2. Using values of outside range may be costly in real life, and maybe time consuming too and sometimes they maybe unrealistic too that means they do not actually occur in a particular situation, hence it is not useful to use them.