In: Economics
Econometric
Explain why it is dangerous to judge the quality of a regression model by maximizing R^2?
You are encouraged to use a hypothetical example in answering this question. (Your answer should be more than 2 sentences long).
*short answer
*hypothetical example
R2 tells us about the "GOODNESS OF FIT" of a model. In other words it tells us what percentage variation of independent variable is explained by percentage variation in dependent variable/variables.
In general it is observed that increasing of dependent variable results in increasing of R2. While variation of dependent variable is explained by independent variables, not all independent variables may be having a statistically significant impact on the dependent variable. So, individual significance of the independent variables must also be checked by performing t test for each regressors.
So, R2 maximization is dangerous to judge quality of a regression model.
-------------------------------------------------------------------------------------------------------------
HYPOTHETICAL EXAMPLE
Let us say that we are regressing GDP on consumption i.e.
Y = B1 + B2(C) + Ui
Hypothetical data:
Y | C |
420 | 250 |
348 | 123 |
311 | 144 |
432 | 68 |
491 | 103 |
225 | 225 |
148 | 50 |
378 | 186 |
446 | 175 |
156 | 25 |
308 | 144 |
486 | 238 |
254 | 230 |
469 | 224 |
467 | 250 |
436 | 111 |
289 | 80 |
444 | 358 |
350 | 154 |
148 | 100 |
276 | 163 |
366 | 211 |
198 | 171 |
198 | 166 |
295 | 127 |
252 | 183 |
197 | 168 |
342 | 260 |
164 | 88 |
399 | 167 |
As we see Consumption is significant at 5% CI and R square is 0.188.
Now we add another variable, leisure hours i.e. total hours a person sleeps in a day.
Y = B1 + B2(C) + B3(S) + Ui
There is supposed to be no statistically significant relation between leisure hours and GDP.
Hypothetical Data:
Y | C | S |
420 | 250 | 9 |
348 | 123 | 14 |
311 | 144 | 11 |
432 | 68 | 11 |
491 | 103 | 8 |
225 | 225 | 14 |
148 | 50 | 10 |
378 | 186 | 10 |
446 | 175 | 14 |
156 | 25 | 13 |
308 | 144 | 11 |
486 | 238 | 9 |
254 | 230 | 9 |
469 | 224 | 13 |
467 | 250 | 11 |
436 | 111 | 13 |
289 | 80 | 10 |
444 | 358 | 11 |
350 | 154 | 10 |
148 | 100 | 8 |
276 | 163 | 12 |
366 | 211 | 11 |
198 | 171 | 9 |
198 | 166 | 13 |
295 | 127 | 10 |
252 | 183 | 9 |
197 | 168 | 8 |
342 | 260 | 7 |
164 | 88 | 13 |
399 | 167 | 7 |
Although R square is higher (0.19>0.188) , S is not statistically significant as P value of test > 0.05.
This proves that Maximizing R square is not a good option to judge quality of model.