In: Statistics and Probability
In the file pubexp.dat there are data on public expenditure on education (EE), gross
domestic product (GDP), and population (P) for 34 countries in the year 1980. It is
hypothesized that per capita expenditure on education is linearly related to per capita
GDP. That is,
yi . b1 . b2xi . ei
where
yi . EEi
Pi
and xi . GDPi
Pi
It is suspected that ei may be heteroskedastic with a variance related to xi.
(a) Why might the suspicion about heteroskedasticity be reasonable?
(b) Estimate the equation using least squares; plot the least squares function and the
residuals. Is there any evidence of heteroskedasticity?
(c) Test for the existence of heteroskedasticity using a White test.
(d) Use White’s formula for least squares variance estimates to find some alternative
standard errors for the least squares estimates obtained in part (b). Use
these standard errors and those obtained in part (b) to construct two alternative
95% confidence intervals for b2. What can you say about the confidence interval
that ignores the heteroskedasticity?
(e) Reestimate the equation under the assumption that var.ei. . s2xi. Report the
results. Construct a 95% confidence interval for b2. Comment on its width
relative to that of the confidence intervals found in part (d).
Answer: am answered 1st 4 subparts.
(a)
Here, errors may increase as the value of per capita GDP increases,Countries with low GDP will tend to spend less on education and the variability will be small. Countries with higher GDP per capita will tend to spend more and the variability may be high.
(b)
Using R:
pubexp <- read.table("C:\\Users\\ANKITS~1\\AppData\\Local\\Temp\\Rtmp0Asy9y\\datad8456b5aab", quote="\"", comment.char="")
y = pubexp[,1]/pubexp[,3]
x = pubexp[,2]/pubexp[,3]
model = lm(y~x)
> coefficients(model)
(Intercept) x
-0.1245728 0.0731732
Hence the estimated equation is
y = -0.1245728 + 0.0731732x
Yes, there is evidence of heteroskedascity. The residuals tend to increase in magnitude as we increase x.
(c)
> summary(white.model)
Call:
lm(formula = error.squared ~ x + x.squared)
Residuals:
Min 1Q Median 3Q Max
-0.048773 -0.011263 -0.005239 0.005350 0.097664
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0176770 0.0161120 1.097 0.2810
x -0.0052062 0.0045479 -1.145 0.2611
x.squared 0.0004840 0.0002638 1.835 0.0762 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.02513 on
31 degrees of freedom
Multiple R-squared: 0.293, Adjusted R-squared:
0.2474
F-statistic: 6.423 on 2 and 31 DF, p-value: 0.004636
> 0.293*34
[1] 9.962
Hence we can reject the null for homoskedascity
the model is heteroskedastic.
(d)
abs(?i) would be standard error for each observation of i
Divide each observation of the linear model by ?i :
and then do regression
> confint(model,'x',level =
0.95)
2.5 % 97.5 %
x 0.06262296 0.08372343
> confint(model_,'x_',level = 0.95)
2.5 % 97.5 %
x_ 0.05901165 0.05944973
the interval that ignores heteroskadscity is larger.