In: Statistics and Probability
Econometric
Explain why it is dangerous to judge the quality of a regression model by maximizing R^2?
You are encouraged to use a hypothetical example in answering this question. (Your answer should be more than 2 sentences long).
R-squared is the percentage of the dependent variable variation that the model explains. The value in your statistical output is an estimate of the population value that is based on your sample. Like other estimates in inferential statistics, you want your R-squared estimate to be close to the population value.
There are situations where the R2in your output is much higher than the correct value for the entire population. Additionally, these conditions can cause other problems, such as misleading coefficients. Consequently, it ispossible to have an R-squared value that is too high even though that sounds counter-intuitive.
High R2 values are not always a problem. In fact, sometimes you can legitimately expect very large values. For example, if you are studying a physical process and have very precise and accurate measurements, it’s possible to obtain valid R-squared values in the high 90s.
On the other hand, human behavior inherently has much more unexplainable variability, and this produces R2 values that are usually less than 50%. 90% is way too high in this context!