In: Statistics and Probability
Question 21 pts
Which of these statements about multicollinearity is FALSE?
A |
If the average variance inflation factor is greater than 1 then the regression model might be biased. |
B |
Multicollinearity in the data is shown by a VIF (variance inflation factor) greater than 10. |
C |
Tolerance values above 0.2 may indicate multicollinearity in the data. |
D |
The tolerance is 1 divided by the VIF (variance inflation factor). |
Flag this Question
Question 31 pts
Recent research has shown that professors are among the most stressed workers. The output below shows the results of a regression using several variables to predict stress among professors. (Data from Cooper, 1988).
Based on the output above, which of the predictors is significantly related to burnout? Check all that apply.
A |
Stress from research |
B |
Perceived control |
C |
Stress from teaching |
D |
Stress from providing pastoral care |
Flag this Question
Question 41 pts
Using the same output from Question 3, how would we interpret the b-value of perceived control? Please use Model 3 to answer this question.
A |
As perceived control increases by .675 units, burnout increases by one unit controlling for the other variables. |
B |
As perceived control increases by 8.271 units, burnout increases by one unit. |
C |
As perceived control increases by one unit, burnout increases by .675 units. |
D |
As perceived control increases by one unit, burnout increases by .675 units, controlling for the other variables. |
Flag this Question
Question 51 pts
Again using the output from Question 3, which variables would we consider eliminating from our model due to concerns of multicollinearity?
A |
Stress from research |
B |
Perceived control |
C |
Stress from teaching |
D |
Stress from providing pastoral care |
Flag this Question
Question 61 pts
Which statistic is useful for assessing the influence of a single predictor in a linear regression? Check all that apply.
A |
R2 change. |
B |
t-statistic. |
C |
Unstandardized B |
D |
Chi Square |
Flag this Question
Question 71 pts
Which of the following are potential sources of bias in a linear model?
A |
Z-scores and influential cases |
B |
Coefficients and outliers |
C |
T-statistics and influential cases |
D |
Outliers and influential cases |
Flag this Question
Question 81 pts
The head of retail sales at a large cosmetic company was interested in determining what the best marketing model was for launching a forthcoming new product to ensure high sales. She ran two separate simple linear regressions; the first used money spent on social media marketing as a predictor and the second had money spent on print media as a predictor. The model featuring social media marketing as a predictor had a R2 of .665, an adjusted R2 of .661, an F-statistic of 112.56 (p < .001). The model featuring print media marketing as a predictor had a R2 of .705, an adjusted R2 of .15, an F-statistic of 34 (p < 0.001). Which marketing model should she invest in, based on these findings, to generate predicted higher sales?
A |
The model featuring print media marketing as a predictor is the better of the two models. |
B |
The model featuring social media marketing as a predictor is the better of the two models. |
C |
Neither model is effective. |
D |
The model featuring social media as a predictor is better but biased. |
Flag this Question
Question 91 pts
A researcher was interested in examining what factors influenced children’s scores in a fitness test. He ran a multiple linear regression, which included four predictors (‘hours spent taking part in physical activity per day’, ‘calories consumed per day’, ‘BMI’, and ‘hours spent watching TV per day’). His model had a R2 of .739, an adjusted R2 of .742, an F-statistic of 109.46 (p < .001). How would you interpret his findings?
A |
It is not a significant model. |
B |
It is a significant model where the four predictors account for 74% of the variance in the children’s scores in the fitness test. |
C |
It is a significant model where the four predictors account for 109% of the variance in the children’s scores in the fitness test. |
Flag this Question
Question 101 pts
The same researcher noticed that his residual scatterplot seemed to violate homoscedasticity. What should he do in order to ensure that his model is robust?
A |
Throw out any outliers and re-run the model. |
B |
Run the model as a stepwise regression so he can manually throw out any bad predictors. |
C |
Perform bootstrapping. |
D |
Without seeing the scatterplot, we can’t tell what he should do. |
Note: Some question which need additional data are skip, but which are possible to solve with the information provide are given correct answers with explanation.
Question 21 pts
Which of these statements about multicollinearity is FALSE?
Correct option = C Tolerance values above 0.2 may indicate
multicollinearity in the data.
It must be below 0.2, along with VIF above 10.
The remain options are corret.
A If the average variance inflation factor is greater than 1 then
the regression model might be biased.
B Multicollinearity in the data is shown by a VIF (variance
inflation factor) greater than 10.
D The tolerance is 1 divided by the VIF (variance inflation
factor).
Which statistic is useful for assessing the influence of a
single predictor in a linear regression? Check all that
apply.
Correct option : B t-statistic.
For each beta coefficient we test the following hypothesis.
Next we check the pvalue for the variable in the regression output and check if the pvalue is less than 0.05, if it is less than 0.05, then we reject the null hypothesis and conclude that the variable is significant.(Note - the pvalue is calculated using the tstat)
Question 71
Which of the following are potential sources of bias in
a linear model?
Correct option : D Outliers and influential cases
Bias in the model is create by extreme values which swing the
model and make it unbalanced.
Zscore, tstat, coefficient are calculated based on the datapoint or
observation, hence they cannot insert bais in the model.
Question 81
The head of retail sales at a large cosmetic company was interested in determining what the best marketing model was for launching a forthcoming new product to ensure high sales. She ran two separate simple linear regressions; the first used money spent on social media marketing as a predictor and the second had money spent on print media as a predictor. The model featuring social media marketing as a predictor had a R2 of .665, an adjusted R2 of .661, an F-statistic of 112.56 (p < .001). The model featuring print media marketing as a predictor had a R2 of .705, an adjusted R2 of .15, an F-statistic of 34 (p < 0.001). Which marketing model should she invest in, based on these findings, to generate predicted higher sales?
Correct answer : B The model featuring social media marketing as a predictor is the better of the two models.
In case of social media the R2 and adjusted R2 are very close
values, indicating no junk variables are present in the
model.
But in the case of print media, adjusted R is much lower than R2,
indicating the presence of junk variables.
A researcher was interested in examining what factors influenced children’s scores in a fitness test. He ran a multiple linear regression, which included four predictors (‘hours spent taking part in physical activity per day’, ‘calories consumed per day’, ‘BMI’, and ‘hours spent watching TV per day’). His model had a R2 of .739, an adjusted R2 of .742, an F-statistic of 109.46 (p < .001). How would you interpret his findings?
Correct answer : B It is a significant model where the four
predictors account for 74% of the variance in the children’s scores
in the fitness test.
We see that pvalue of the Fstatistic is less than 0.05, hence the
model is significant.
Coefficient of determination( adjusted rsqaure) = 0.742
It is the measure of the amount of varaiblity in y explained by x.
Its value lies between 0 and 1. Greater the value, better is the
model. In this case, it 0.742%, hence the model is good