In: Statistics and Probability
Suppose you were asked to investigate which predictors explain the number of minutes that 10- to18-year-old students spend on Twitter. To do so, you build a linear regression model with Twitter usage (Y) measured as the number of minutes per week. The four predictors you include in the model are Height, Weight, Grade Level, and Age of each student. You build four simple linear regression models with Y regressed separately on each predictor, and each predictor is statistically significant. Then you build a multiple linear regression model with Y regressed on all four predictors, but only one predictor, Age, is statistically significant, and the others are not. What is likely going on among the four predictors? If you include more than one of these predictors in the model, what are some problems that can result?
Assumption in regression problems is predictors should not have multicolinearity.
Since when you build linear regression model with Y regressed separately on each predictor(x), then you have only one predictor and predict Y.It means your data have no corelation ( Multicolinearity) between others predictors because no others predictor available in your data set. therefore , each predictor is statistically significant.
On other hand , you build a multiple linear regression model with Y regressed on all four predictors(x1,x2,x3,x4), then you have get corelation between Height, weight and grade level to each other therefore its statistically significant on output (Y) is very low, but age is not corelated with Height, weight and grade level therefore its statistically significant on output (Y) is very high.
If you include more than one of these predictors in the model and these predictors will have corelation( Multicolinearity), then you will get problems and your accuracy will go down(i.e adjusted R - square decreases).