Question

In: Statistics and Probability

(True or False) Model selection methods such as best subsets and stepwise regression, when properly used,...

  1. (True or False) Model selection methods such as best subsets and stepwise regression, when properly used, can suggest models that fit the data well.

Solutions

Expert Solution

True;

There are two methods of stepwise regression: the forward method and the backward method and
In the forward method, the software looks at all the predictor variables you selected and picks the one that predicts the most on the dependent measure. That variable is added to the model. This is repeated with the variable that then predicts the most on the dependent measure. This little procedure continues until adding predictors does not add anything to the prediction model anymore.
In the backward method, all the predictor variables you chose are added into the model. Then, the variables that do not (significantly) predict anything on the dependent measure are removed from the model one by one.
The backward method is generally the preferred method, because the forward method produces so-called suppressor effects. These suppressor effects occur when predictors are only significant when another predictor is held constant.
There are two key flaws with stepwise regression. First, it underestimates certain combinations of variables. Because the method adds or removes variables in a certain order, you end up with a combination of predictors that is in a way determined by that order. That combination of variables may not be closest to how it is in reality. Second, the model that is found is selected out of the many possible models that the software considered. It will often fit much better on the data set that was used than on a new data set because of sample variance.

If you have a very large set of potential independent variables from which you wish to extract a few--i.e., if you're on a fishing expedition--you should generally go forward. If, on the other hand, if you have a modest-sized set of potential variables from which you wish to eliminate a few--i.e., if you're fine-tuning some prior selection of variables--you should generally go backward. (If you're on a fishing expedition, you should still be careful not to cast too wide a net, lest you dredge up variables that are only accidentally related to your dependent variable.)


Related Solutions

What will be the final regression model formed from regression with the stepwise predictor selection method?...
What will be the final regression model formed from regression with the stepwise predictor selection method? Please specify the actual value(s) of the parameter estimate(s) in the model. Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) 35.383 4.911 7.204 .000 GREQ .064 .008 .787 8.359 .000 2 (Constant) 27.927 4.556 6.130 .000 GREQ .055 .007 .668 7.838 .000 AR 3.940 .950 .353 4.147 .000 3 (Constant) 23.231 4.654 4.992 .000 GREQ .037 .010 .450...
2) Please answer questions A and B based on the following Best Subsets Regression. Best Subsets...
2) Please answer questions A and B based on the following Best Subsets Regression. Best Subsets Regression: Sales versus Price, Promotional exp, Quality Response is Sales P r o m o t i o n Q a u P l a r l i e i Mallows c x t Vars R-Sq R-Sq(adj)       Cp S e p y    1 81.3       79.8 6.1 7.8239 X    1 58.0       54.5 26.2 11.734 X    1   9.5        2.0 68.1 17.221 X   ...
True or False: -Linear regression is one of the least commonly used regression techniques. -The difference...
True or False: -Linear regression is one of the least commonly used regression techniques. -The difference between simple linear regression and multiple linear regression is that, multiple linear regression has (>1) independent variables, whereas simple linear regression has only 1 independent variable. -Least Square Method calculates the best-fit line for the observed data by minimizing the sum of the squares of the horizontal deviations from each data point to the line. -We can evaluate the model performance using the metric...
True or False: Regression analysis is used for prediction, while correlation analysis is used to measure...
True or False: Regression analysis is used for prediction, while correlation analysis is used to measure the strength of the association between two numerical variables. A. True B. False In performing a regression analysis involving two numerical variables, we are assuming A. the variances of X and Y are equal. B. the variation around the line of regression is the same for each X value. C. that X and Y are independent. D. All of these. Which of the following...
True or False 1) in a population regression model, we assume the errors are independent of...
True or False 1) in a population regression model, we assume the errors are independent of each other and normally distributed with constant variance 2) when we apply the ordinary least squares to estimate the slope and intercept of a linear model, the sum of all the residuals can be less than or greater than zero 3) if SSE is near zero in a regression, the statistician will conclude that the proposed model probably has too poor a fit to...
1) True or False? In multinomial logistic regression analysis, the model fit is given by the...
1) True or False? In multinomial logistic regression analysis, the model fit is given by the value of R-squared. In linear regression analysis, the overall significance test is an F test. In logistic regression analysis, the overall significance test is an F test. When building a model for regression analysis, the type of data for the outcome variable guides you to choose linear, logistic, or multinomial logistic. Two variables that covary might not have a causal relationship. Causation implies correlation....
When should a regression model be used to make a prediction? When should it not be...
When should a regression model be used to make a prediction? When should it not be used to make a prediction? What value tells us how good the prediction will be?
All recessions are result of output gaps. True or False? Explain properly.
All recessions are result of output gaps. True or False? Explain properly.
The best measure for model selection is the Adjusted R-square.
  The best measure for model selection is the Adjusted R-square. Partial sums of squares are more useful than sequential sums of squares. If we have a categorical variable with 4 categories we will need 4 dummy variables to model this.    
What four different methods are frequently used to collect data? 14. True or false? Randomization is...
What four different methods are frequently used to collect data? 14. True or false? Randomization is used to increase bias in a study. ? 15. What are three key elements of a well-designed experiment? 16. What problem can be introduced if confounding variables are present in a study? 17. Why is the validity of an experiment important? 18. What is the purpose of using blinding in a study? 19. You are asked to select a random sample of 10 students...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT