Question

In: Statistics and Probability

The director of admissions of a small college selected 120 students at random from the new...

The director of admissions of a small college selected 120 students at random from the new freshman class in a study to determine whether a student’s grade point average (GPA) at the end of the freshman year (y) can be predicted from the ACT test score (x1).

GPA ACT ITS RP   
3.897 21 122 99
3.885 14 132 71
3.778 28 119 95
2.540 22 99 75
3.028 21 131 46
3.865 31 139 77
2.962 32 113 85
3.961 27 136 99
0.500 29 75 13
3.178 26 106 97
3.310 24 125 69
3.538 30 142 99
3.083 24 120 97
3.013 24 107 55
3.245 33 125 93
2.963 27 121 80
3.522 25 119 63
3.013 31 128 78
2.947 25 106 93
2.118 20 123 22
2.563 24 111 84
3.357 21 113 87
3.731 28 134 98
3.925 27 128 95
3.556 28 126 63
3.101 26 121 79
2.420 28 104 86
2.579 22 113 90
3.871 26 133 97
3.060 21 125 39
3.927 25 128 97
2.375 16 112 57
2.929 28 107 67
3.375 26 115 81
2.857 22 119 75
3.072 24 113 63
3.381 21 115 15
3.290 30 110 95
3.549 27 122 93
3.646 26 118 99
2.978 26 114 90
2.654 30 112 99
2.540 24 106 85
2.250 26 95 84
2.069 29 102 58
2.617 24 114 86
2.183 31 116 82
2.000 15 93 34
2.952 19 120 34
3.806 18 117 23
2.871 27 119 95
3.352 16 115 41
3.305 27 113 28
2.952 26 108 68
3.547 24 116 54
3.691 30 135 77
3.160 21 108 58
2.194 20 110 73
3.323 30 124 94
3.936 29 130 98
2.922 25 118 99
2.716 23 110 91
3.370 25 117 95
3.606 23 123 72
2.642 30 116 65
2.452 21 109 53
2.655 24 110 81
3.714 32 126 41
1.806 18 99 84
3.516 23 121 84
3.039 20 115 35
2.966 23 127 70
2.482 18 99 15
2.700 18 108 47
3.920 29 129 98
2.834 20 103 77
3.222 23 122 72
3.084 26 118 29
4.000 28 135 80
3.511 34 139 88
3.323 20 128 80
3.072 20 120 46
2.079 26 114 89
3.875 32 133 91
3.208 25 123 95
2.920 27 111 83
3.345 27 122 92
3.956 29 136 99
3.808 19 140 41
2.506 21 109 68
3.886 24 133 98
2.183 27 98 59
3.429 25 134 89
3.024 18 124 89
3.750 29 128 92
3.833 24 149 97
3.113 27 121 43
2.875 21 117 52
2.747 19 110 82
2.311 18 104 61
1.841 25 95 72
1.583 18 96 33
2.879 20 117 97
3.591 32 130 97
2.914 24 121 92
3.716 35 125 99
2.800 25 112 61
3.621 28 136 72
3.792 28 129 99
2.867 25 106 76
3.419 22 108 66
3.600 30 138 70
2.394 20 106 44
2.286 20 111 33
1.486 31 101 77
3.885 20 113 57
3.800 29 131 96
3.914 28 140 97
1.860 16 111 65
2.948 28 110 85

1.) Plot the residuals ei against the fitted values ˆyi (in R). What departures from the regression model assumptions can be studied from this plot? What are your findings? (Note:If you are not sure about the validity of any of the assumptions, perform a formal test to verify your answer.)

2.) Prepare a normal probability plot (QQ plot) of the residuals. What assumption can be tested from this plot and what do you conclude? (Note:You can also use the formal test to reinforce your conclusion).

3.) Information is given for each student on two variables not included in the model, namely,intelligence test score (ITS-x2) and high school class rank percentile (RP-x3).Plot the residuals you obtained in part (b) against x2 and x3 on separate graphs to as certain whether the model can be improved by including either of these variables. What do you conclude? (Hint:The residuals represent any variability that was not able to be explained by x1. Therefore, if you see any pattern between the residuals and any other predictor omitted from the model, there is an indication that the predictor will be useful to be added in the model.)

Solutions

Expert Solution

R-commands alongwith outputs:

d=read.table("data.txt",header=TRUE) # data is file name

y=d$GPA
x1=d$ACT
fit=lm(y~x1)
fit

Call:
lm(formula = y ~ x1)
Coefficients:
(Intercept) x1
2.11405 0.03883
s=summary(fit)
s

Call:
lm(formula = y ~ x1)
Residuals:
Min 1Q Median 3Q Max
-2.74004 -0.33827 0.04062 0.44064 1.22737
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.11405 0.32089 6.588 1.3e-09 ***
x1 0.03883 0.01277 3.040 0.00292 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6231 on 118 degrees of freedom
Multiple R-squared: 0.07262, Adjusted R-squared: 0.06476
F-statistic: 9.24 on 1 and 118 DF, p-value: 0.002917

par(mfrow=c(2,2))
plot(fit)

# a)
ei=residuals(fit)
ei
yihat=fitted(fit)
yihat
plot(yihat,ei)


# The random pattern of residuals vs fitted plot supports Linear model.
# Here though, pattern is seen, parallel lines indicating categorical variable .
# This can be seen by:
sort(unique(x1))
# [1] 14 15 16 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

# b)
qqnorm(ei)
qqline(ei)

# Most of the points appear below the straight line, therefore, normlaity assumption is not satisfied.
# As it is not exactly clear from graph, we perform test.
shapiro.test(ei)
# Shapiro-Wilk normality test
# data: ei
# W = 0.95249, p-value = 0.0003304
# H0: Data are normally distributed.
# As p-value=0.0003304=0 (approx) which is less than alpha=0.05, we Reject H0.
# Data is not normal.

# c)
x2=d$ITS
x3=d$RP
# ei is the residual obtained in part b)
plot(x2,ei)

# Clearly, a pattern is seen roughly tilted rectangle.
# Therefore, variable x2 appears important in fitting regression model [as per Hint given]

plot(x3,ei)
# Here, roughly, random pattern is seen.

# If we fit model including x2 (ITS) variable, then a larger value of R-squared is obtained. [Larger the R-squared, better is the model].

> fit1=lm(y~x1+x2)
> s1=summary(fit1)
> s1

Call:
lm(formula = y ~ x1 + x2)

Residuals:
Min 1Q Median 3Q Max
-1.14807 -0.23843 -0.02077 0.28775 1.03430

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.915124 0.360025 -5.319 5.07e-07 ***
x1 0.003609 0.008433 0.428 0.669
x2 0.041537 0.003076 13.504 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3912 on 117 degrees of freedom
Multiple R-squared: 0.6375, Adjusted R-squared: 0.6313
F-statistic: 102.9 on 2 and 117 DF, p-value: < 2.2e-16


Related Solutions

The director of admissions of a small college selected 120 students at random from the new...
The director of admissions of a small college selected 120 students at random from the new freshman class in a study to determine whether a student's grade point average (gpa) at the end of freshman year can be predicted from the ACT test score. (a) Read in the dataset (it is in the le named ACT.txt ). (b) Obtain the least-squares estimates of intercept and slope, and state the estimated regression function. (c) Plot the data and the estimated regression...
*in r studio file 4. The director of admissions of a small college selected 120 students...
*in r studio file 4. The director of admissions of a small college selected 120 students at random from the new freshman class in a study to determine whether a student’s grade point average (GPA) at the end of the freshman year (y) can be predicted from the ACT test score (x). Estimate the simple linear regression of, y = β0 + β1x + ε, using gpa.txt data and answer the following questions. (4 pts each) (a) Report the least...
(Problems 1.19, 1.23 and 2.4. from KNN) The director of admissions of a small college selected...
(Problems 1.19, 1.23 and 2.4. from KNN) The director of admissions of a small college selected 120 students at random from the new freshman class in a study to determine whether a student’s grade point average (GPA) at the end of the freshman year (Y) can be predicted from the ACT test score (X). 3.897 21 3.885 14 3.778 28 2.540 22 3.028 21 3.865 31 2.962 32 3.961 27 0.500 29 3.178 26 3.310 24 3.538 30 3.083 24...
An admissions director wants to estimate the mean age of all students enrolled at a college....
An admissions director wants to estimate the mean age of all students enrolled at a college. The estimate must be within 1 years of the population mean. Assume the population of ages is normally distributed and the population standard deviation is 9.5 years. Determine the minimum sample size required to construct a 80% confidence interval for the population mean age. Determine the minimum sample size required to construct a 95% confidence interval for the population mean age. Which level of...
A college admissions director wishes to estimate the mean age of all students currently enrolled. In...
A college admissions director wishes to estimate the mean age of all students currently enrolled. In a random sample of 81 students, the mean age is found to be 20.51 years. From past studies, the standard deviation of the population is known to be 2 years, and the population is normally distributed. Construct a 99% confidence interval of the population mean age. (Round off final answers to two decimal places, if appropriate. Do not round off numbers taken directly from...
A college admissions director wishes to estimate the mean age of all students currently enrolled. In...
A college admissions director wishes to estimate the mean age of all students currently enrolled. In a random sample of 19 students, the mean age is found to be 22.4 years. From past studies, the ages of enrolled students are normally distributed with a standard deviation of 9.5 years. Construct a 90% confidence interval for the mean age of all students currently enrolled. 1. What is the critical value? 2. What is the standard deviation of the sample mean? 3....
A college admissions director wishes to estimate the mean age of all students currently enrolled. In...
A college admissions director wishes to estimate the mean age of all students currently enrolled. In a random sample of 19 students, the mean age is found to be 22.4 years. From past studies, the ages of enrolled students are normally distributed with a standard deviation of 9.5 years. Construct a 90% confidence interval for the mean age of all students currently enrolled. 1. what is the critical value ? 2. the margin of error?
A college admissions director wishes to estimate the mean age of all students currently enrolled. In...
A college admissions director wishes to estimate the mean age of all students currently enrolled. In a random sample of 22 students, the mean age is found to be 21.4 years. From past studies, the ages of enrolled students are normally distributed with a standard deviation of 10.2 years. Construct a 90% confidence interval for the mean age of all students currently enrolled. b. The standard deviation of the sample mean:
A college admissions office takes a simple random sample of 120 entering freshmen and computes their...
A college admissions office takes a simple random sample of 120 entering freshmen and computes their mean SAT score to be 448. The population standard deviation is 116. Bsed on a 98%confidence interval mean, is it likely that the mean SAT score for entering freshmen is greater than 464? (first construct the 98% confidence interval)
TABLE 12-11 The director of admissions at a state college is interested in seeing if admissions...
TABLE 12-11 The director of admissions at a state college is interested in seeing if admissions status (admitted, waiting list, denied admission) at his college is independent of the type of community in which an applicant resides. He takes a sample of recent admissions decisions and forms the following table: Admitted Wait ADMITTED WAIT LIST DENIED TOTAL URBAN 45 21 17 83 RURAL 33 13 24 70 SUBURBAN 34 12 39 85 TOTAL 112 46 80 238 He will use...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT