In: Statistics and Probability
Please provide Stata commands and outputs for all relevant statistic, thank you.
1. The following data present the experience and salary structure of University of XYZ economists in 2017-2018. The variables are y = salary (thousands of dollars) x = years of experience (defined as years since receiving Ph.D.) Estimate the regression of y on x. Present all relevant statistics including Estimated coefficients, standard errors, t-statistics, R-squared, sum of squared residuals, an standard error of regression. Give reasons why the regression does or does not make sense. Calculate the residuals to see whether there are any outliers. Would you discard these observations or look for other explanations?
Data Set Below...
Y | X |
77.7 | 3 |
67.7 | 6 |
100.4 |
7 |
93.2 | 9 |
99.3 | 10 |
112.4 | 10 |
100.3 | 12 |
101.3 | 12 |
90.6 | 13 |
101.5 | 15 |
90 | 16 |
98.1 | 16 |
111 | 16 |
37.5 | 17 |
100.7 | 17 |
95 | 18 |
105 | 18 |
103 | 19 |
105 | 19 |
106.5 | 19 |
92.4 | 20 |
96.8 | 20 |
93 | 21 |
94.41 | 22 |
97.7 | 22 |
91.2 | 23 |
105 | 25 |
102 | 26 |
89 | 30 |
91 | 32 |
104.3 | 32 |
106 | 43 |
USING EXCEL<DATA<MEGASTAT<CORRELATION/REGRESSION<REGRESSION
The output is as follows:
Regression Analysis | ||||||
r² | 0.042 | |||||
r | 0.204 | |||||
Std. Error | 13.829 | |||||
n | 32 | |||||
k | 1 | |||||
Dep. Var. | Y | |||||
ANOVA table | ||||||
Source | SS | df | MS | F | p-value | |
Regression | 248.9338 | 1 | 248.9338 | 1.30 | .2629 | |
Residual | 5,737.5612 | 30 | 191.2520 | |||
Total | 5,986.4950 | 31 | ||||
Regression output | confidence interval | |||||
variables | coefficients | std. error | t (df=30) | p-value | 95% lower | 95% upper |
Intercept | 89.3640 | 5.98 | 14.94 | 0.00 | 77.15 | 101.58 |
X | 0.3390 | 0.2972 | 1.141 | .2629 | -0.2679 | 0.9460 |
Observation | Y | Predicted | Residual |
1 | 77.70 | 90.38 | -12.68 |
2 | 67.70 | 91.40 | -23.70 |
3 | 100.40 | 91.74 | 8.66 |
4 | 93.20 | 92.42 | 0.78 |
5 | 99.30 | 92.75 | 6.55 |
6 | 112.40 | 92.75 | 19.65 |
7 | 100.30 | 93.43 | 6.87 |
8 | 101.30 | 93.43 | 7.87 |
9 | 90.60 | 93.77 | -3.17 |
10 | 101.50 | 94.45 | 7.05 |
11 | 90.00 | 94.79 | -4.79 |
12 | 98.10 | 94.79 | 3.31 |
13 | 111.00 | 94.79 | 16.21 |
14 | 37.50 | 95.13 | -57.63 |
15 | 100.70 | 95.13 | 5.57 |
16 | 95.00 | 95.47 | -0.47 |
17 | 105.00 | 95.47 | 9.53 |
18 | 103.00 | 95.81 | 7.19 |
19 | 105.00 | 95.81 | 9.19 |
20 | 106.50 | 95.81 | 10.69 |
21 | 92.40 | 96.15 | -3.75 |
22 | 96.80 | 96.15 | 0.65 |
23 | 93.00 | 96.48 | -3.48 |
24 | 94.41 | 96.82 | -2.41 |
25 | 97.70 | 96.82 | 0.88 |
26 | 91.20 | 97.16 | -5.96 |
27 | 105.00 | 97.84 | 7.16 |
28 | 102.00 | 98.18 | 3.82 |
29 | 89.00 | 99.54 | -10.54 |
30 | 91.00 | 100.21 | -9.21 |
31 | 104.30 | 100.21 | 4.09 |
32 | 106.00 | 103.94 | 2.06 |
Answers:
Estimate the regression of y on x.
y=0.339x+89.364
Present all relevant statistics including Estimated coefficients The coefficient are 0.339
standard errors 5.98
t-statistics 1.30
R-squared 0.042
sum of squared residuals 5737.5612
an standard error of regression 0.2972
Give reasons why the regression does or does not make sense.?
The linear regression makes sense as the r- square is low this combination indicates that the independent variables((x) are correlated with the dependent variable(y). The p-value is also greater which means the model is not significant.
Calculate the residuals to see whether there are any outliers. Would you discard these observations or look for other explanations?
Yes, there is presence of outliers. With respect to regression, outliers are influential only if they have a big effect on the regression equation. Since r square is low such that there are no influential points in the residualsa(outliers).