In: Statistics and Probability
1) Why are the procedures of Diagnostic checks so important in data analysis? Should we always do variable transformation?
2) Why does fitting multiple linear regression model helps to explain the outcome better?
3) Should we include as many predictors as possible in multiple linear regression? What do you will happen if the number of predictors are greater than the number of observations?
Solution :
1) Diagnostic is one of a set of procedures available for regression analysis that seek to assess the validity of a model in any of a number of different ways.This assessment may be an exploration of the model's underlying statistical assumptions, an examination of the structure of the model by considering formulations that have fewer, more or different explanatory variables, or a study of subgroups of observations, looking for those that are either poorly represented by the model (outliers) or that have a relatively large effect on the regression model's predictions.
Data do not always come in a form that is immediately suitable for analysis. We often have to transform the variables before carrying out the analysis. Transformations are applied to accomplish certain objectives such as to ensure linearity, to achieve normality, or to stabilize the variance.
2) Multiple linear regression analysis helps us to understand how much will the dependent variable change when we change the independent variables. For instance, a multiple linear regression can tell you how much GPA is expected to increase (or decrease) for every one point increase (or decrease) in IQ and many variables.
Here the independent variables are increasing the accuracy of output is increases because maximum variable gues us maximum information to predict the outcome. Hence fitting multiple linear regression model helps to explain the outcome better.
3) Yes we include the many predictors in multiple regression . But untile the multicollinearity not occurred.
We can perform the regression model if the predictors are greater than observation. but there is no longer a unique least squares coefficient estimate the variance is infinite so the method cannot be used at all.