In: Statistics and Probability
Explain the five-step process for evaluating a multiple regression model.
When we say about the relationship between two variable then the regression is comes in the picture. The linear regression is the relationship between dependent variable and independent variable.Linear regression models are used to predict the relationship between two variables. The variable that is being predicted is called the dependent variable.
Multiple regression analysis is an extension of simple linear regression. It’s useful for describing and making predictions based on linear relationships between predictor variables.
Multiple linear regression:
when there is one dependent variable and one independent variable in the regression model then the model is called as simple linear regression.when there are one dependent variable and two or more than two independent variable in model then The model is called as multiple linear regression.
Five steps in multiple linear regression model :
Step 1 : Selecting the variable in model
To pick the right variables, you’ve got to have a basic understanding of your dataset, enough to know that your data is relevant, high quality, and of adequate volume. As part of your model building efforts, you’ll be working to select the best predictor variables for your model.
The two following methods will be helpful to you in the variable selection process.
Try out an automatic search procedure and let R decide what variables are best. Stepwise regression analysis is best.
1)Use all-possible-regressions to test all possible subsets of predictor variables. With the all-possible-regressions method, you get to pick the numerical criteria by which you’d like to have the models ranked. Popular numerical criteria are as follows:
- R2 – The set of variables with the highest R2 value are the best fit variables for the model.
–adj R2 The sets of variables with larger adjusted R2 values are the better fit variables for the model.
-Cp– The smaller the Cp value, the less total mean square error, and the less regression bias there is.
-PRESSp– The smaller the predicted sum of squares value, the better the predictive capabilities of the model.
These are the for selecting the best variable for model
Step 2 : Model refinding
Check the utility of the model by examining the following criteria:
1)Global F test: Test the significance of your predictor variables (as a group) for predicting the response of your dependent variable.
2)AdjustedR2: Check the overall sample variation of the dependent variable that is explained by the model after the sample size and the number of parameters have been adjusted. Adjusted R2 values are indicative of how well your predictive equation is fit to your data. Larger adjusted R2 values indicate that variables are a better fit for the model.
3)Rootmean square error (MSE): MSE provides an estimation for the standard deviation of the random error. An interval of ±2 standard deviations approximates the accuracy in predicting the response variable based on a specific subset of predictor variables.
4)Coefficient of variation (CV): If a model has a CV value that’s less than or equal to 10%, then the model is more likely to provide accurate predictions.
These are vary helpful for re finding the model
Step 3 : Varifying and testing model assumptions
Now before the conclusion we have to check that our data meets the following imp assumptions of a linear regression model. If you want a valid result from multiple regression analysis, these assumptions must be satisfied.
1) Youmust have three or more variables that are of metric scale (integer or ratio variables) and that can be measured on a continuous scale.
2)Yourdata cannot have any major outliers, or data points that exhibit excessive influence on the rest of the dataset.
3)Variable relationship exhibit (1) linearity – your response variable has a linear relationship with each of the predictor variables, and (2) additivity – the expected value of your response variable is based on the additive effects of the different predictor variables.
4)Your data shows an independence of observations, or in other words, there is no autocorrelation between variables.
5)Your data demonstrates an absence of multicollinearity.
6)Yourdata is homoscedastic.
7)Yourresiduals must be normally distributed.
Step 4 : Addressing potential problem with model
-If your data is heteroscedastic, you can try transforming your response variable.
-Ifyour residuals are non-normal, you can either (1) check to see if your data could be broken into subsets that share more similar statistical distributions, and upon which you could build separate models OR (2) check to see if the problem is related to a few large outliers. If so, and if these are caused by a simple error or some sort of explainable, non-repeating event, then you may be able to remove these outliers to correct for the non-normality in residuals.
-If you are seeing correlation between your predictor variables, try taking one of them out.
Step 5:
Now at the end you have to find out whether the model you’ve chosen is valid or not. The following three methods will be helpful with that.
1) Check the predicted values by collecting new data and checking it against results that are predicted by your model.
2) Check the results predicted by your model against your own common sense. If they clash, you’ve got a problem.
3)Crossvalidate results by splitting your data into two randomly-selected samples. Use one half of the data to estimate model parameters and use the other half for checking the predictive results of your model.
Now these are the Five main steps for the multiple regression testing model.
If you understood then RATE POSITIVE ?. If any queries please feel free to ask in comment box. Thank you.