Question

In: Statistics and Probability

Does linear regression estimate a cause and effect relationship? Why or why not?

Does linear regression estimate a cause and effect relationship? Why or why not?

Solutions

Expert Solution

quick answer - No

Any statistics text worth its salt will caution the reader not to confuse correlation with causation. Yet the mistake is very common. As a refresher, here's an example:

Consider elementary school students' shoe sizes and scores on a standard reading exam. They are correlated, but saying that larger shoe size causes higher reading scores is as absurd as saying that high reading scores cause larger shoe size.

In this example, there is a clear lurking variable, namely, age. As the child gets older, both their shoe size and reading ability increase.

Elaborating on this situation:

If you agree that increasing age (for elementary school children) causes increasing foot size, and therefore increasing shoe size, then you expect a correlation between age and shoe size. Correlation is symmetric, so shoe size and age are correlated. But it would be absurd to say that shoe size causes age.

In other words, even when there is a causal relationship, the causality typically only goes one way. (Of course, it could go both ways, as in a feedback loop.)

One situation where people slip into confusing correlation and causality is in regression. For example, one might regress college GPA on SAT scores, obtaining a positive coefficient beta of SAT score in the regression equation. Consider the following two statements:

  1. An increase of one point in SAT scores causes, on average, an increase of ? points in college GPA.
  2. For every increase of one point in SAT scores, the increase in average college GPA is ? points.

Statement 2 is correct (assuming, of course, that the regression has been carried out correctly). Statement 1 is incorrect: the regression equation gives no information about causality. Indeed, there is likely a lurking variable (or probably a bunch of lurking variables) that affects both GPA and SAT score; SAT score is considered to be a (perhaps crude) measure of this lurking variable.


Related Solutions

Estimate a multiple linear regression relationship with the U.K. stock returns as the dependent variable, and...
Estimate a multiple linear regression relationship with the U.K. stock returns as the dependent variable, and U.K. Corporate Bond yield (Interest rate), U.S. Stock Returns, and Japan Stock Returns as the independent variables using the monthly data covering the sample period 1980-2017 (Finding the determinants of U.K. stock returns). Show the estimated regression relationship Conduct a t-test for statistical significance of the individual slope coefficients at the 1% level of significance. Provide the interpretation of the significant slope estimates. Conduct...
Which of the following is a common Cause-and-Effect Forecasting Model?             Multiple Regression             Linear Trend...
Which of the following is a common Cause-and-Effect Forecasting Model?             Multiple Regression             Linear Trend Forecast             Moving Average Forecast             Mean Absolute Deviation The impact of poor communication and inaccurate forecasts resonates along the supply chain and results in the…?             Carbonaro effect             Delphi effect             Bullwhip effect             Doppler effect Which of the following is not a New Product forecasting approach?             Time series forecast             Analog/looks like forecast             Judgement forecast with expert opinion            ...
What is the difference between a linear relationship and a curvilinear relationship in linear regression?
What is the difference between a linear relationship and a curvilinear relationship in linear regression?
In a simple linear regression analysis, will the estimate of the regression line be the same...
In a simple linear regression analysis, will the estimate of the regression line be the same if you exchange X and Y? Why or why not?
Estimate a simple linear regression model and present the estimated linear equation. Display the regression summary...
Estimate a simple linear regression model and present the estimated linear equation. Display the regression summary table and interpret the intercept and slope coefficient estimates of the linear model.                                                           Estimate a simple linear regression model and present the estimated linear equation. Display the regression summary table and interpret the intercept and slope coefficient estimates of the linear model.                                                           
Estimate a simple linear regression model and present the estimated linear equation. Display the regression summary...
Estimate a simple linear regression model and present the estimated linear equation. Display the regression summary table and interpret the intercept and slope coefficient estimates of the linear model.                                                           
Estimate a multiple linear regression relationship with the U.K. stock returns as the dependent variable (intercept), and RBUK, U.S.
SUMMARY OUTPUT Regression Statistics Multiple R 0.727076179 R Square 0.528639771 Adjusted R Square 0.525504337 Standard Error 3.573206748 Observations 455 ANOVA df SS MS F Significance F Regression 3 6458.025113 2152.67504 168.601791 2.7119E-73 Residual 451 5758.280717 12.7678065 Total 454 12216.30583 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 99.0% Upper 99.0% Intercept -0.250148858 0.359211364 -0.6963835 0.48654745 -0.9560846 0.45578693 -1.1793476 0.67904987 RBUK 0.025079378 0.023812698 1.05319345 0.29281626 -0.0217182 0.07187699 -0.0365187 0.08667745 RSUS 0.713727515 0.042328316 16.8617037 8.0578E-50 0.6305423 0.79691273 0.60423372 0.82322131...
When we estimate a linear multiple regression model (including a linear simple regression model), it appears...
When we estimate a linear multiple regression model (including a linear simple regression model), it appears that the calculation of the coefficient of determination, R2, for this model can be accomplished by using the squared sample correlation coefficient between the original values and the predicted values of the dependent variable of this model. Is this statement true? If yes, why? If not, why not? Please use either matrix algebra or algebra to support your reasoning.
Linear regression Hello What does it mean that the residuals in linear regression is normal distributed?...
Linear regression Hello What does it mean that the residuals in linear regression is normal distributed? Why is it only the residuals that is, and not the "raw" data? And why do we want our residuals to be normal?
How can we use “linear regression” to estimate non-linear functional forms?
How can we use “linear regression” to estimate non-linear functional forms?
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT