In: Statistics and Probability
id sex status income verbal gamble 1 1 51 2 8 0 2 1 28 2.5 8 0 3 1 37 2 6 0 4 1 28 7 4 7.3 5 1 65 2 8 19.6 6 1 61 3.47 6 0.1 7 1 28 5.5 7 1.45 8 1 27 6.42 5 6.6 9 1 43 2 6 1.7 10 1 18 6 7 0.1 11 1 18 3 6 0.1 12 1 43 4.75 6 5.4 13 1 30 2.2 4 1.2 14 1 28 2 6 3.6 15 1 38 3 6 2.4 16 1 38 1.5 8 3.4 17 1 28 9.5 8 0.1 18 1 18 10 5 8.4 19 1 43 4 8 12 20 0 51 3.5 9 0 21 0 62 3 8 1 22 0 47 2.5 9 1.2 23 0 43 3.5 5 0.1 24 0 27 10 4 156 25 0 71 6.5 7 38.5 26 0 38 1.5 7 2.1 27 0 51 5.44 4 14.5 28 0 38 1 6 3 29 0 51 0.6 7 0.6 30 0 62 5.5 8 9.6 31 0 18 12 2 88 32 0 30 7 7 53.2 33 0 38 15 7 90 34 0 71 2 10 3 35 0 28 1.5 1 14.1 36 0 61 4.5 8 70 37 0 71 2.5 7 38.5 38 0 28 8 6 57.2 39 0 51 10 6 6 40 0 65 1.6 6 25 41 0 48 2 9 6.9 42 0 61 15 9 69.7 43 0 75 3 8 13.3 44 0 66 3.25 9 0.6 45 0 62 4.94 6 38 46 0 71 1.5 7 14.4 10. A study of teenage gambling in Britain was performed in 2008. There is 47 observations and 5 variables. Download the data set Gambling from Blackboard and answer the following questions. a) Make a numerical and graphical summary of the data, commenting on any features that you fi interesting. Limit the output your present to a quantity that a busy reader would find sufficient. b) What percent of the variation in the response is explained by these predictors? c) Which observation has the largest (positive) residual? d) Compute the mean and median of the residuals. e) For all other predictors held constant, what would be the difference in predicted expenditure on gambling for a male compared to a female? f) Which variables are statistically significant? g) Predict the amount that a male with average status, income, and verbal score would gamble along with an appropriate 95% CI. Repeat the prediction for a male with maximal values of status, income, and verbal score. Which CI is wider and why is this result expected? h) Fit a model with just income as a predictor and use an F?test to compare it to the full model. i) Check the constant variance, normality, and linearity assumption. De- scribe your findings.
a)
b)
Regression Output:
Since the value of R Square for this model is 0.5279, 52.79% of the variation in the dependent variable is explained by the independent variables.
c)
Observation 24 has the largest Residual.
For this observation,
Gamble = 156
And Gamble Predicted = 61.93
Residual = 94.07
d)
Mean of the residuals = -2.66*1015
Median of the residuals = -1.91
e)
22
Since the coefficient of Sex is -22, difference in predicted expenditure on gambling for a male compared to a female will be 22
f)
Since the p-value for variables sex, income and verbal is less than 0.05, hence they are significant