In: Statistics and Probability
True;
There are two methods of stepwise regression: the forward method
and the backward method and
In the forward method, the software looks at all the predictor
variables you selected and picks the one that predicts the most on
the dependent measure. That variable is added to the model. This is
repeated with the variable that then predicts the most on the
dependent measure. This little procedure continues until adding
predictors does not add anything to the prediction model
anymore.
In the backward method, all the predictor variables you chose are
added into the model. Then, the variables that do not
(significantly) predict anything on the dependent measure are
removed from the model one by one.
The backward method is generally the preferred method, because the
forward method produces so-called suppressor effects. These
suppressor effects occur when predictors are only significant when
another predictor is held constant.
There are two key flaws with stepwise regression. First, it
underestimates certain combinations of variables. Because the
method adds or removes variables in a certain order, you end up
with a combination of predictors that is in a way determined by
that order. That combination of variables may not be closest to how
it is in reality. Second, the model that is found is selected out
of the many possible models that the software considered. It will
often fit much better on the data set that was used than on a new
data set because of sample variance.
If you have a very large set of potential independent variables from which you wish to extract a few--i.e., if you're on a fishing expedition--you should generally go forward. If, on the other hand, if you have a modest-sized set of potential variables from which you wish to eliminate a few--i.e., if you're fine-tuning some prior selection of variables--you should generally go backward. (If you're on a fishing expedition, you should still be careful not to cast too wide a net, lest you dredge up variables that are only accidentally related to your dependent variable.)