In: Statistics and Probability
The data for a college bookstore for 12 semesters is below (X= number of students registered in a course, Y= number of books sold for that course):
X: 36,28,35,39,30,30,31,38,36,38,29,26
Y: 31,29,34,35,29,30,30,38,34,33,29,26
Answer the following questions BY HAND
a) Calculate the least squares regression line for these data.
b) Calculate and explain R^2. Calculate adjusted R^2.
c) Under the simple linear regression model assumptions, what is the unbiased estimator of variance?
d) Calculate the 95% confidence intervals of beta and alpha.
e) Conduct a hypothesis test to verify the number of students registered is a significant predictor for the number of books actually sold for a course at 0.05 significance level.
f) Find the 90% confidence interval for µY |x when x=31, and the prediction interval for a new observation of Y when x=31. Which interval is wider?
X | Y | X^2 | Y^2 | XY | ||||
36 | 31 | 1296 | 961 | 1116 | ||||
28 | 29 | 784 | 841 | 812 | ||||
35 | 34 | 1225 | 1156 | 1190 | ||||
39 | 35 | 1521 | 1225 | 1365 | ||||
30 | 29 | 900 | 841 | 870 | ||||
30 | 30 | 900 | 900 | 900 | ||||
31 | 30 | 961 | 900 | 930 | ||||
38 | 38 | 1444 | 1444 | 1444 | ||||
36 | 34 | 1296 | 1156 | 1224 | ||||
38 | 33 | 1444 | 1089 | 1254 | ||||
29 | 29 | 841 | 841 | 841 | ||||
26 | 26 | 676 | 676 | 676 | ||||
0 | 0 | 0 | ||||||
0 | 0 | 0 | ||||||
SUM | 396 | 378 | 13288 | 12030 | 12622 | |||
n | 12 | df1 | 1 | k-1 | ||||
Mean | 33 | 31.5 | df2 | 10 | n-k | |||
SSxx | 220 | Sum(x^2) - ((Sum(x))^2 /n) | SSR | 99.56364 | slope * Ssxy | MSR | 99.56364 | SSR/df1 |
Ssyy | 123 | Sum(y^2) - ((Sum(y))^2 /n) | SSE | 23.43636 | SST-SSR | MSE | 2.343636 | SSE/df2 |
Ssxy | 148 | Sum(xy) - (Sum(x)*Sum(y)/n) | SST | 123 | Ssyy | F | 42.48254 | MSR/MSE |
slope | 0.672727 | Ssxy/SSxx | ||||||
intercept | 9.3 | Mean Y - Mean X * Slope | ||||||
Se | 1.530894 | SQRT(SSE/(n-2)) | ||||||
Sb1 | 0.103213 | Se/SQRT(SSxx) | ||||||
Sb0 | 3.4346 | Se*SQRT(1/n+(Xbar/SSxx)) | ||||||
r | 0.8997 | Ssxy/SQRT(SSxx*Ssyy) | ||||||
r^2 | 0.80946 |
a)
the least squares regression line for these data
Y = 9.3+0.6727*X
b)
R^2 = 0.8095
80.95% of variation in Y variable is explained by X variable(or regression)
Adj R^2 = 1-[(1-R^2)(N-1)/(n-k)] = 0.7904
If adjusted R^2 nearer to 1 which means regression fits the data
c)
Variance σ^2 = MSE = 2.3436
Variance of β1 = VAR(β1) = σ^2/SSxx = 2.3436/220 = 0.0106 it is nearer to 0
d)
95% CI
Vaiables | Lower 95% | Upper 95% |
Intercept | 1.647290781 | 16.95270922 |
X | 0.44275471 | 0.902699836 |
95% CI Intercept = (β0 +/- tc * Sb0)
95% C X = (β1 +/- tc * Sb1)
e)
Hypothesis:
H0: β1 = 0
Ha: β1 not = 0
Test:
t stat = b1/Sb1 = 6.5179
P value = 0 < 0.05
the number of students registered is a significant predictor for the number of books actually sold for a course at 0.05 significance level
f)
If x=31
Y = 9.3+0.6727*X = 9.3+0.6727*31 = 30.1545
df = n-k = 10
alpha = 0.1
tc=1.813 (Use t table)