In: Statistics and Probability
Use the least squares method to fit a simple linear model that relates the salary (dependent variable) to education (independent variable).
a- What is your model? State the hypothesis that is to be tested, the decision rule, the test statistic, and your decision, using a level of significance of 5%.
b – What percentage of the variation in salary has been explained by the regression?
c – Provide a 95% confidence interval estimate for the true slope value.
d - Based on your model, what is the expected salary of a new hire with 12 years of education?
e – What is the 95% prediction interval for the salary of a new hire with 12 years of education? Use the fact that the distance value = 0.011286
Salary data (n=93):
39000,40200,42900,43800,43800,43800,43800,43800,44400,45000,45000,46200,48000,48000,48000,48000,48000,48000,48000,48000,48000,48000,49800,51000,51000,51000,51000,51000,51000,51600,52200,52200,52800,52800,52800,54000,54000,54000,54000,54000,54000,54000,54000,54000,54000,54000,54000,55200,55200,55800,56400,57000,57000,57000,57000,57000,60000,60000,61200,63000,63000,46200,50400,51000,51000,52200,54000,54000,54000,54000,54000,57000,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000,63000,66000,66000,66000,68400,69000,69000,81000
Education data (n=93):
12,10,12,8,8,12,12,12,15,8,12,12,8,12,12,12,12,12,12,12,12,16,8,8,12,12,15,15,16,12,8,12,8,8,12,8,8,12,12,12,12,12,12,15,15,15,15,12,12,12,12,12,12,15,15,15,12,15,12,12,15,12,15,12,12,12,12,12,12,15,15,15,8,12,12,12,12,12,12,15,15,15,15,15,16,15,15,15,15,15,12,15,16
(a) The model is:
y = 38186 + 1281*x
The hypothesis being tested is:
H0: β1 = 0
H1: β1 ≠ 0
The test statistic is 4.313.
The p-value is 0.0000.
Since the p-value (0.0000) is less than the significance level (0.05), we can reject the null hypothesis.
Therefore, we can conclude that the model is significant.
(b) 16.97%
(c) The 95% confidence interval estimate for the true slope value is between 690.9706 and 1870.748.
(d) 53555.91
(e) The 95% prediction interval for the salary of a new hire with 12 years of education is between 40569.57 and 66542.25.
The R code is:
Salary <-
c(39000,40200,42900,43800,43800,43800,43800,43800,44400,45000,45000,46200,48000,48000,48000,48000,48000,48000,48000,48000,48000,48000,49800,51000,51000,51000,51000,51000,51000,51600,52200,52200,52800,52800,52800,54000,54000,54000,54000,54000,54000,54000,54000,54000,54000,54000,54000,55200,55200,55800,56400,57000,57000,57000,57000,57000,60000,60000,61200,63000,63000,46200,50400,51000,51000,52200,54000,54000,54000,54000,54000,57000,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000,63000,66000,66000,66000,68400,69000,69000,81000)
Education <-
c(12,10,12,8,8,12,12,12,15,8,12,12,8,12,12,12,12,12,12,12,12,16,8,8,12,12,15,15,16,12,8,12,8,8,12,8,8,12,12,12,12,12,12,15,15,15,15,12,12,12,12,12,12,15,15,15,12,15,12,12,15,12,15,12,12,12,12,12,12,15,15,15,8,12,12,12,12,12,12,15,15,15,15,15,16,15,15,15,15,15,12,15,16)
mod <- lm(Salary ~ Education)
summary(mod)
aov(mod)
confint(mod)
predict(mod, data.frame(Education = 12), interval =
"prediction")