In: Statistics and Probability
What are the assumptions that must be satisfied before a simple linear regression can be performed?
Regression analysis is commonly used for modeling the relationship between a single dependent variable Y and one or more predictors. When we have one predictor then it is called as simple linear regression, for which following equation is given;
E[Y] = β0 + β1X
Assumptions satisfied before simple linear regression can be performed are as follows;
1. Linearity : The relationship between predictor X and the mean of Y is linear. It is also important to check for outliers. The linearity assumption can best be tested with scatter plots.
2. Homoscedasticity : The variance of residual is the same for any value of X. The scatter plot is good way to check whether the data are homoscedastic (meaning the residuals are equal across the regression line).
3. Independence : All the observations are independent of each othe. This means that residuals (errors) should be uncorrelated.
4. Normality : For any fixed value of X, Y is normally distributed. Normality can be checked with a goodness of fit test, e.g., the Kolmogorov-Smirnov test. When the data is not normally distributed a non-linear transformation (e.g. log-transformation) might fix this issue.