Question

In: Statistics and Probability

The following data set is obtained by a randomly selected sample of 93 employees working at...

The following data set is obtained by a randomly selected sample of 93 employees working at a bank.

SALARY EDUC EXPER TIME
39000 12 0 1
40200 10 44 7
42900 12 5 30
43800 8 6 7
43800 8 8 6
43800 12 0 7
43800 12 0 10
43800 12 5 6
44400 15 75 2
45000 8 52 3
45000 12 8 19
46200 12 52 3
48000 8 70 20
48000 12 6 23
48000 12 11 12
48000 12 11 17
48000 12 63 22
48000 12 144 24
48000 12 163 12
48000 12 228 26
48000 12 381 1
48000 16 214 15
49800 8 318 25
51000 8 96 33
51000 12 36 15
51000 12 59 14
51000 15 115 1
51000 15 165 4
51000 16 123 12
51600 12 18 12
52200 8 102 29
52200 12 127 29
52800 8 90 11
52800 8 190 1
52800 12 107 11
54000 8 173 34
54000 8 228 33
54000 12 26 11
54000 12 36 33
54000 12 38 22
54000 12 82 29
54000 12 169 27
54000 12 244 1
54000 15 24 13
54000 15 49 27
54000 15 51 21
54000 15 122 33
55200 12 97 17
55200 12 196 32
55800 12 133 30
56400 12 55 9
57000 12 90 23
57000 12 117 25
57000 15 51 17
57000 15 61 11
57000 15 241 34
60000 12 121 30
60000 15 79 13
61200 12 209 21
63000 12 87 33
63000 15 231 15
46200 12 12 22
50400 15 14 3
51000 12 180 15
51000 12 315 2
52200 12 29 14
54000 12 7 21
54000 12 38 11
54000 12 113 3
54000 15 18 8
54000 15 359 11
57000 15 36 5
60000 8 320 21
60000 12 24 2
60000 12 32 17
60000 12 49 8
60000 12 56 33
60000 12 252 11
60000 12 272 19
60000 15 25 13
60000 15 36 32
60000 15 56 12
60000 15 64 33
60000 15 108 16
60000 16 46 3
63000 15 72 17
66000 15 64 16
66000 15 84 33
66000 15 216 16
68400 15 42 7
69000 12 175 10
69000 15 132 24
81000 16 55 33

This data set was obtained by collecting information on a randomly selected sample of 93 employees working at a bank.

SALARY-  starting annual salary at the time of hire

EDUC  -  number of years of schooling at the time of the hire

EXPER -  number of months of previous work experience at the time of hire

TIME   -  number of months that the employee has been working at the bank until now

2. Use the least squares method to fit a simple linear model that relates the salary (dependent variable) toeducation (independent variable).

a)  What is your model? State the hypothesis that is to be tested, the decision rule, the test statistic, and your decision, usinga level of significance of 5%.

b)  What percentage of the variation in salary has been explained by the regression?

c) Provide a 95% confidence interval estimate for the true slope value.

d) Based on your model, what is the expected salary of a new hire with 12 years of education

e ) What is the 95% prediction interval for the salary of a new hire with 12 years of education? Use the fact that the distance value = 0.011286

Please explain clearly.

Solutions

Expert Solution

Sol:

Perform in R studio

use lm function in R to fit a linear model of salary on educ

Use sumamry function to get the coeffcient and p value

Predict function to get the confidence and predicttion interval for newdata=12 educ

Rcode:

df1 =read.table(header = TRUE, text ="
SALARY   EDUC   EXPER   TIME
39000   12   0   1
40200   10   44   7
42900   12   5   30
43800   8   6   7
43800   8   8   6
43800   12   0   7
43800   12   0   10
43800   12   5   6
44400   15   75   2
45000   8   52   3
45000   12   8   19
46200   12   52   3
48000   8   70   20
48000   12   6   23
48000   12   11   12
48000   12   11   17
48000   12   63   22
48000   12   144   24
48000   12   163   12
48000   12   228   26
48000   12   381   1
48000   16   214   15
49800   8   318   25
51000   8   96   33
51000   12   36   15
51000   12   59   14
51000   15   115   1
51000   15   165   4
51000   16   123   12
51600   12   18   12
52200   8   102   29
52200   12   127   29
52800   8   90   11
52800   8   190   1
52800   12   107   11
54000   8   173   34
54000   8   228   33
54000   12   26   11
54000   12   36   33
54000   12   38   22
54000   12   82   29
54000   12   169   27
54000   12   244   1
54000   15   24   13
54000   15   49   27
54000   15   51   21
54000   15   122   33
55200   12   97   17
55200   12   196   32
55800   12   133   30
56400   12   55   9
57000   12   90   23
57000   12   117   25
57000   15   51   17
57000   15   61   11
57000   15   241   34
60000   12   121   30
60000   15   79   13
61200   12   209   21
63000   12   87   33
63000   15   231   15
46200   12   12   22
50400   15   14   3
51000   12   180   15
51000   12   315   2
52200   12   29   14
54000   12   7   21
54000   12   38   11
54000   12   113   3
54000   15   18   8
54000   15   359   11
57000   15   36   5
60000   8   320   21
60000   12   24   2
60000   12   32   17
60000   12   49   8
60000   12   56   33
60000   12   252   11
60000   12   272   19
60000   15   25   13
60000   15   36   32
60000   15   56   12
60000   15   64   33
60000   15   108   16
60000   16   46   3
63000   15   72   17
66000   15   64   16
66000   15   84   33
66000   15   216   16
68400   15   42   7
69000   12   175   10
69000   15   132   24
81000   16   55   33

"
)
df1
linmod=lm(SALARY ~ EDUC ,data=df1)
coefficients(linmod)
summary(linmod)
newdata=data.frame(EDUC=12)
attach(df1)
predict(linmod,newdata,level=0.95,interval="confidence")
predict(linmod,newdata,level=0.95,interval="predict")
Output:

> coefficients(linmod)
(Intercept) EDUC
38185.598 1280.859
> summary(linmod)

Call:
lm(formula = SALARY ~ EDUC, data = df1)

Residuals:
Min 1Q Median 3Q Max
-14555.9 -4632.5 444.1 3767.5 22320.7

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 38186 3774 10.117 < 2e-16 ***
EDUC 1281 297 4.313 4.08e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6501 on 91 degrees of freedom
Multiple R-squared: 0.1697,   Adjusted R-squared: 0.1606
F-statistic: 18.6 on 1 and 91 DF, p-value: 4.077e-05

> newdata=data.frame(EDUC=12)
> attach(df1)
> predict(linmod,newdata,level=0.95,interval="confidence")
fit lwr upr
1 53555.91 52184.04 54927.78
> predict(linmod,newdata,level=0.95,interval="predict")
fit lwr upr
1 53555.91 40569.57 66542.25

ANSWER:(2A)

linear regression model is

salary= 38185.598+1280.859 *Educ

slope=1280.859

y intercept=38185.598

Ho:

no linear relationship between salary and educ

Ha:

linear relationship between salary and educ

alpha=0.05

F statistic= 18.6 p-value: 4.077e-05

p<0.05

Reject Ho

Accept Ha

Conclusion:

There is suffcient statistcial evidence at 5% level of significance to conclude that there is a linear relationship between salary and educ

Model is significant

we can use this model to predict SALARY from EDUC

Solution-b:

R sq=0.1697

=0.1697*100

=16.97% variation in salary is explained by educ

Explained variance=16.97%

unexplained variance=100-16.97=83.03%

c) Provide a 95% confidence interval estimate for the true slope value.

confint(linmod)
2.5 % 97.5 %
(Intercept) 30688.2625 45682.933
EDUC 690.9706 1870.748

95% confidence interval estimate for the true slope value lies in between 690.9706 and 1870.748

d) Based on your model, what is the expected salary of a new hire with 12 years of education

salary= 38185.598+1280.859 *Educ

for Educ=12 substitute in regression eq

salary= 38185.598+1280.859 *12

=53555.91

e ) What is the 95% prediction interval for the salary of a new hire with 12 years of education? Use the fact that the distance value = 0.011286

95% prediction interval from output is

40569.57 and 66542.25


Related Solutions

The Excel file BankData shows the values of the following variables for randomly selected 93 employees...
The Excel file BankData shows the values of the following variables for randomly selected 93 employees of a large bank. This real data set was used in a court lawsuit against discrimination.   Let = monthly salary in dollars (SALARY), = years of schooling at the time of hire (EDUCAT), = number of months of previous work experience (EXPER), = number of months that the individual was hired by the bank (MONTHS), = dummy variable coded 1 for males and 0...
The Excel file BankData shows the values of the following variables for randomly selected 93 employees...
The Excel file BankData shows the values of the following variables for randomly selected 93 employees of a large bank. (A very similar data set was used in a court lawsuit against discrimination.)   Let = monthly salary in dollars (SALARY), = years of schooling at the time of hire (EDUCAT), = number of months of previous work experience (EXPER), = number of months that the individual was hired by the bank (MONTHS), = dummy variable coded 1 for males and...
The Excel file Salary reports the monthly salaries for 93 randomly and independently selected employees of...
The Excel file Salary reports the monthly salaries for 93 randomly and independently selected employees of a bank; there are 32 salaries of male employees and 61 salaries of female employees.   Let um = the mean monthly salary for all male bank employees, and uf = the mean monthly salary for all female bank employees. Your objective is to find some evidence of um > uf, that is, the female employees are discriminated against. Provide descriptive statistical summaries of the...
In the spreadsheet "Salary Data" you see sample salaries of employees that were randomly selected from...
In the spreadsheet "Salary Data" you see sample salaries of employees that were randomly selected from two departments of a company. Assuming that the populations are distributed normally, conduct the appropriate test to determine whether the average salaries are different between these two departments (alpha = .05). Develop the hypotheses and report your conclusion. Department A Department B 51000 48000 54000 54000 53000 46000 54000 48000 47000 55000 46000 52000 54000 51000 52000 46000 50000 51000 48000 48000 48000 53000...
A survey of 120 randomly selected employees of a large insurance company shows that the sample...
A survey of 120 randomly selected employees of a large insurance company shows that the sample proportion of employees which feel secure about their job is 0.35 and the standard error of this estimate is 0.044. Construct a 99% confidence interval for the proportion of all this company’s employees who feel secure about their jobs. Round your bounds to 2 decimal places.
A random sample of data was obtained 61 12 6 40 27 38 93 5 13...
A random sample of data was obtained 61 12 6 40 27 38 93 5 13 40 a) How many data items? n = ? b) What is the sample mean? c) What is the sample variance? d) What is the sample standard deviation? e) Based on the number of sample items will you be using a z or a t for the mean testing? f) Form a 95% confidence interval for the mean. g) Form a 97% confidence interval...
The following sample observations were randomly selected:
The following sample observations were randomly selected:12345X:213832Y:18132249            a. Not available in Connect.b. Determine the regression equation.(Negative answer should be indicated by a minus sign. Do not round intermediate calculations. Round the final answers to 4 decimal places.)                                           b =  a =Y' =  +  X           c. Determine the value of Y' whenX is 11. (Do not round intermediate calculations. Round the final answer to 4 decimal places.)
Refer to the data set of 20 randomly selected presidents given below. Treat the data as...
Refer to the data set of 20 randomly selected presidents given below. Treat the data as a sample and find the proportion of presidents who were taller than their opponents. Use that result to construct a​ 95% confidence interval estimate of the population percentage. Based on the​ result, does it appear that greater height is an advantage for presidential​ candidates? Why or why​ not? Construct a​ 95% confidence interval estimate of the percentage of presidents who were taller than their...
Refer to the data set of 20 randomly selected presidents given below. Treat the data as...
Refer to the data set of 20 randomly selected presidents given below. Treat the data as a sample and find the proportion of presidents who were taller than their opponents. Use that result to construct a​ 95% confidence interval estimate of the population percentage. Based on the​ result, does it appear that greater height is an advantage for presidential​ candidates? Why or why​ not? Click the icon to view the table of heights. Construct a​ 95% confidence interval estimate of...
Refer to the data set of 2020 randomly selected presidents given below. Treat the data as...
Refer to the data set of 2020 randomly selected presidents given below. Treat the data as a sample and find the proportion of presidents who were taller than their opponents. Use that result to construct a​ 95% confidence interval estimate of the population percentage. Based on the​ result, does it appear that greater height is an advantage for presidential​ candidates? Why or why​ not? LOADING... Click the icon to view the table of heights. Construct a​ 95% confidence interval estimate...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT