In: Statistics and Probability
Is your complete second-order model a statistically useful model for predicting your dependent variable? Justify your response.
complete model = B0+B1X1+B2(X1)2+B3X2+B4X1X2+B5(X1)2X2
Ohh....very subjective question.
It is very difficult to decide which model is best for prediction without having the essential information. However, let me guide you with some points to decide on your own in any situation.
The whole purpose of the regression analysis is to use one or more variables to predict an outcome. Today, we will discuss on linear regression, starting with checking the overall outcome. This step is the most important part of the analysis, and it is actually calculated using the F test, just like ANOVA. Because of this, you will see the output in terms of an F value. To put the F value in perspective, we have to give some detail about the analysis, and we do this using degrees of freedom. All you really need to know about degrees of freedom (df) is that the first value of df reflects the number of predictors, and the second value of df reflects the sample size. In most research, these are presented in parentheses after the F value. Luckily, just about any analytic software out there will interpret your F in the context of your df for you, meaning it gives you a p value right away. If this p value is lower than your significance cutoff (usually .05), you know you have a good regression, meaning it is able to use one or more of your predictors to calculate an estimate for your outcome!
Once you’ve established a nice significant model, the next step is to look at your details. The most overarching detail is R2. This will always range from 0 to 1, and can only be positive. You can multiply this number by 100 to get a percentage explaining how much of the variability in your participants’ outcome scores are explained by your predictors. But keep in mind that this number does not have any meaning unless the regression is significant! This outcome comes in two flavors, natural and adjusted; the natural R2 tends to be higher when you have more variables in the regression (i.e., the more info you have, the more you should be able to explain regardless of how much they relate), while the adjusted R2 takes this artificial inflation into account, and scales back based on the number of predictors.
Once you know everything you could possibly want to know about the overall regression, it is time to dig into your predictors. Each predictor has a corresponding p value, which is different from the overall regression’s p value. If a predictor is significant, you can start making some claims about it. A simple outcome to look at here is the standardized beta (β). This tells you how strong the relationship between the predictor and outcome are after controlling for everything else in the model. It can range from -1 to 1, where (+1) is the strongest, and the sign simply indicates whether there is a positive or negative association. However, another important output can be found from the unstandardized beta (B). This value gives you the slope between the predictor and outcome. We previously talked a little about how these values work for binary predictors here, and continuous predictors are pretty similar. For these, a single unit increase in the predictor corresponds with an increase (for positive B) or decrease (for negative B) corresponding with the B value.