In: Statistics and Probability
In a simple linear regression analysis, will the estimate of the regression line be the same if you exchange X and Y? Why or why not?
The association between a predictor X and a response Y have different result after swapping .
The Pearson correlation coefficient of x and y is the same, whether you compute pearson(x, y) or pearson(y, x). This suggests that doing a linear regression of y given x or x given y should be the same, but that's not the case, because Pearson's correlation is not the only term in the regression equation.
The insight that since Pearson's correlation is the same whether we do a regression of x against y, or y against x is a good one, we should get the same linear regression is a good one. It is only slightly incorrect, and we can use it to understand what is actually occurring.
This is the equation for a line, which is what we are trying to get from our regression
The equation for the slope of that line is driven by Pearson's correlation
This is the equation for Pearson's correlation. It is the same whether we are regressing x against y or y against x
However when we look back at our second equation for slope, we see that Pearson's correlation is not the only term in that equation. If we are calculating y against x, we also have the sample standard deviation of y divided by the sample standard deviation of x. If we were to calculate the regression of x against y we would need to invert those two terms.
we can say that the interpretation of the regression equation changes when we regress x on y instead of y on x.