In: Statistics and Probability
So, as we look at Linear Regression and correlation this week, please find provide an example of how and when linear regression is used.
1. To answer how linear regression is used:-
Given a data set {yi, xi1, ......., xip}_{i=1}^{n}} of n statistical units, a linear regression model takes the form as follows - Often these n equations are stacked together and written in matrix notation as -
where
We want to estimate the regression coefficients in such a way so that the residual sum of squares
is minimum. Thus here we use the ordinary least square method to estimate the regression coefficients by minimizing the residual sum of squares.
Thus a linear regression model is fitted.
Use: Linear regression is a common Statistical Data Analysis technique. It is used to determine the extent to which there is a linear relationship between a dependent variable and one or more independent variables. In particular, the purpose of linear regression is to "predict" the value of the dependent variable based upon the values of one or more independent variables. For fitting a linear model there are no particular rules which everyone has to abide by, rather it is a cyclic process. First, one should develop an idea, then implement the idea through fitting the model & then based on the accuracy of the model one should try to improve the idea. Multiple R-square & Adjusted R-square are two metrics which are helpful in assessing what proportion of the total variability of the target/dependent variable explained through the fitted regression model. The closer the values of these 2 metrics to 1 the better the fitted linear model is.
2. To answer when linear regression is used:-
The sensible use of linear regression on a data set requires that following assumptions about that data set be true -
i) The independent variables should have a linear relationship with the dependent variable.
ii). The regression model should be linear in parameters.
iii) The residuals obtained due to predicting the values of the dependent variable through the fitted regression model should be independent of each other.
iv) Those residuals should have a constant variance.
v) Also, the normality of the residuals is also assumed.
vi) To use the least square estimation method for estimating the regression coefficients, the independent or the explanatory variables should not have linear relationship amongst themselves i.e. they should be as much mutually independent as possible.
vii) The dependent variable should be measured in a numeric (continuous) measurement scale.