Question

In: Statistics and Probability

SALARY EDUC EXPER TIME 39000 12 0 1 40200 10 44 7 42900 12 5 30...

SALARY EDUC EXPER TIME
39000 12 0 1
40200 10 44 7
42900 12 5 30
43800 8 6 7
43800 8 8 6
43800 12 0 7
43800 12 0 10
43800 12 5 6
44400 15 75 2
45000 8 52 3
45000 12 8 19
46200 12 52 3
48000 8 70 20
48000 12 6 23
48000 12 11 12
48000 12 11 17
48000 12 63 22
48000 12 144 24
48000 12 163 12
48000 12 228 26
48000 12 381 1
48000 16 214 15
49800 8 318 25
51000 8 96 33
51000 12 36 15
51000 12 59 14
51000 15 115 1
51000 15 165 4
51000 16 123 12
51600 12 18 12
52200 8 102 29
52200 12 127 29
52800 8 90 11
52800 8 190 1
52800 12 107 11
54000 8 173 34
54000 8 228 33
54000 12 26 11
54000 12 36 33
54000 12 38 22
54000 12 82 29
54000 12 169 27
54000 12 244 1
54000 15 24 13
54000 15 49 27
54000 15 51 21
54000 15 122 33
55200 12 97 17
55200 12 196 32
55800 12 133 30
56400 12 55 9
57000 12 90 23
57000 12 117 25
57000 15 51 17
57000 15 61 11
57000 15 241 34
60000 12 121 30
60000 15 79 13
61200 12 209 21
63000 12 87 33
63000 15 231 15
46200 12 12 22
50400 15 14 3
51000 12 180 15
51000 12 315 2
52200 12 29 14
54000 12 7 21
54000 12 38 11
54000 12 113 3
54000 15 18 8
54000 15 359 11
57000 15 36 5
60000 8 320 21
60000 12 24 2
60000 12 32 17
60000 12 49 8
60000 12 56 33
60000 12 252 11
60000 12 272 19
60000 15 25 13
60000 15 36 32
60000 15 56 12
60000 15 64 33
60000 15 108 16
60000 16 46 3
63000 15 72 17
66000 15 64 16
66000 15 84 33
66000 15 216 16
68400 15 42 7
69000 12 175 10
69000 15 132 24
81000 16 55
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.41198516
R Square 0.16973178
Adjusted R Square 0.16060795
Standard Error 6501.12045
Observations 93
ANOVA
df SS MS F Significance F
Regression 1 786253429 786253429 18.60313 4.08E-05
Residual 91 3.85E+09 42264567.1
Total 92 4.63E+09
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 38185.5979 3774.3766 10.117061 1.45E-16 30688.26252 45682.93 30688.26 45682.93
X Variable 1 1280.85932 296.96712 4.31313512 4.08E-05 690.9706164 1870.748 690.9706 1870.748

This data set was obtained by collecting information on a randomly selected sample of 93 employees working at a bank.

SALARY- starting annual salary at the time of hire

EDUC   - number of years of schooling at the time of the hire

EXPER - number of months of previous work experience at the time of hire

TIME    - number of months that the employee has been working at the bank until now

2. Use the least squares method to fit a simple linear model that relates the salary (dependent variable) to education (independent variable).

a- What is your model? State the hypothesis that is to be tested, the decision rule, the test statistic, and your decision, using a level of significance of 5%.

b – What percentage of the variation in salary has been explained by the regression?

c – Provide a 95% confidence interval estimate for the true slope value.

d - Based on your model, what is the expected salary of a new hire with 12 years of education?

e – What is the 95% prediction interval for the salary of a new hire with 12 years of education? Use the fact that the distance value = 0.011286

Solutions

Expert Solution

a. The assumed underlying model is a simple linear model

,  

Looking under the Coefficients column of the fitted model:
are the estimated coefficients

So the fitted model is


Hypothesis testing for the Intercept:



The test statistic, Under the Null hypothesis is t-distributed with n-1 = 93-1 = 92 degrees of freedom

  

Here

Plugging in,
Hence the The obtained test-statistic
At level of significance (5%) i.e.
The critical test-statistc value =

Since the obtained test statistic t = 10.11 is greater than the critical test statistic = 1.986
We reject the NULL hypothesis,
The Intercept term is SIGNIFICANT at 5% level of significance

Similiarly for the slope term
Hypothesis testing for the Slope:



The test statistic, Under the Null hypothesis is t-distributed with n-1 = 93-1 = 92 degrees of freedom

  

Here

Plugging in,
Hence the The obtained test-statistic
At level of significance (5%) i.e.
The critical test-statistc value =

Since the obtained test statistic t = 4.313 is greater than the critical test statistic = 1.986
We reject the NULL hypothesis,
The slope term is SIGNIFICANT at 5% level of significance

b) The percentage of variation of salary explained by the regression is =

The value of for the regression =  

This value is also mentioned directly in the model output as R Square

So the percentage of variation of salary explained by the regression is =

c) 95% confidence interval for the true slope value

The   confidence interval is given by

This value is also directly mentioned in the model output Under Lower 95% and Upper 95%
for X Variable 1

d) Expected salary of a new hire with 12 years of experience =

Putting EDUC = 12 in the fitted regression model, we get

e)  95% prediction interval for the salary of a new hire with 12 years of education

Prediction Interval = Estimated value ± T_{\alpha/2, n-1} * Prediction Error

Prediction Error = Standard Error of the Regression * SQRT(1 + distance value)

Standard Error of Regression

So Prediction Error =

And the Prediction Interval =

Please upvote and provide feedback if this answer helped you. This would help me improve and better my solutions.
I will be happy to answer your doubts, if any in the comment section below. Thanks! :)


Related Solutions

SALARY EDUC EXPER TIME 39000 12 0 1 40200 10 44 7 42900 12 5 30...
SALARY EDUC EXPER TIME 39000 12 0 1 40200 10 44 7 42900 12 5 30 43800 8 6 7 43800 8 8 6 43800 12 0 7 43800 12 0 10 43800 12 5 6 44400 15 75 2 45000 8 52 3 45000 12 8 19 46200 12 52 3 48000 8 70 20 48000 12 6 23 48000 12 11 12 48000 12 11 17 48000 12 63 22 48000 12 144 24 48000 12 163 12...
exper score salary 4 78 24 7 100 43 1 86 23.7 5 82 34.3 8...
exper score salary 4 78 24 7 100 43 1 86 23.7 5 82 34.3 8 86 35.8 10 84 38 0 75 22.2 1 80 23.1 6 83 30 6 91 33 9 88 38 2 73 26.6 10 75 36.2 1-R2= 2-F test statistic= 3.b2= 4-P-value for F test=
(9) Diagonalizing 4 0 1 -7 -5 5. -12 -6 7
(9) Diagonalizing 4 0 1 -7 -5 5. -12 -6 7
A = (1 −7 5 0 0 10 8 2 2 4 10 3 −4 8...
A = (1 −7 5 0 0 10 8 2 2 4 10 3 −4 8 −9 6) (1) Count the number of rows that contain negative components. (2) Obtain the inverse of A and count the number of columns that contain even number of positive components. (3) Assign column names (a,b,c,d) to the columns of A. (4) Transform the matrix A into a vector object a by stacking rows. (5) Replace the diagonal components of A with (0,0,2,3). Hint:...
A= 1 0 -7 7 0 1 0 0 2 -2 10 -7 2 -2 2...
A= 1 0 -7 7 0 1 0 0 2 -2 10 -7 2 -2 2 1 Diagonalize the matrix above. That is, find matrix D and a nonsingular matrix P such that A = PDP-1 . Use the representation to find the entries of An as a function of n.
Input Data Month 0 1 2 3 4 5 6 7 8 9 10 11 12...
Input Data Month 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Revenue $             -   $            -   $            -   $        -   $        -   $         -   $    2,500 $    2,875 $    3,306 $    3,802 $    4,373 $    5,028 $    5,783 $    6,650 $    7,648 $      8,795 $   10,114 $   11,631 $   13,376 $   15,382 $   17,689 $   20,343 $   23,394 $   26,903 Monthly Revenue Growth...
Input Data Month 0 1 2 3 4 5 6 7 8 9 10 11 12...
Input Data Month 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Revenue $             -   $            -   $            -   $        -   $        -   $         -   $    2,500 $    2,875 $    3,306 $    3,802 $    4,373 $    5,028 $    5,783 $    6,650 $    7,648 $      8,795 $   10,114 $   11,631 $   13,376 $   15,382 $   17,689 $   20,343 $   23,394 $   26,903 Monthly Revenue Growth...
Q 0 1 2 3 4 5 6 7 8 TC 10 18 24 30 38...
Q 0 1 2 3 4 5 6 7 8 TC 10 18 24 30 38 50 66 91 120 AC 18 12 10 9.5 10 11 13 15 MC 8 6 6 8 12 16 25 29 TR 16 32 48 64 80 96 112 128 MR 16 16 16 16 16 16 16 16 A perfectly competitive TV production firm in Lost Angeles faces the above short-run cost schedule. The price per unit of output is £16. a)...
Salary Frequency 0 to 19,999 3 20,000 to 29,999 7 30,000 to 39,999 12 40,000 to...
Salary Frequency 0 to 19,999 3 20,000 to 29,999 7 30,000 to 39,999 12 40,000 to 49,999 24 50,000 to 59,999 17 60,000 to 69,999 13 70,000 to 79,999 10 80,000 to 89,999 8 90,000 or above 6 The table above shows the salary range and frequency of employees at a local company. What is the probability a randomly selected employee will have a salary of $50,000 or above? 0.17 0.30 0.48 0.54
age educ male sleep totwrk yngkid 32 12 1 3113 3438 0 age: age in year...
age educ male sleep totwrk yngkid 32 12 1 3113 3438 0 age: age in year 31 14 1 2920 5020 0 educ: years of schooling 44 17 1 2670 2815 0 male: =1 if male 30 12 0 3083 3786 0 sleep: mins sleep at night, per week 64 14 1 3448 2580 0 totwrk: mins worked per week 41 12 1 4063 1205 0 yngkid: =3 if children <3 present 35 12 1 3180 2113 1 47 13...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT