In: Statistics and Probability
a. What is data transformation in the context of linear regression and why it is needed?
b. Please list different transformation techniques with a brief explanation for each.
1) The most important criteria in linear regression is that X and Y variables must be linearly related. So when we plot a scatterplot they must show some linear relation among them. But when this is not true then we use data transformation where we try to transform either X or Y to bring that linear relation among them. Data transformation is simply used for this purpose.
2) One of the most used transformation is logarithm. When Y is exponential function of X then to bring linear relation between X and Y we take log of Y values.
One is square root transformation. When Y is function of square root of X then to bring linear relation we transform Y into square root of Y.
There are some standard transformation like 1/Y or 1/X which depends on relation between X and Y. The basic thing is we need to plot x and y values and see how is the relationship between them and according take transformation if needed.