In: Economics
yi=a0+a1X1+a2X2+a3X3+ui
What is the effect of measurement error in Y? How is this different from the effect of measurement error in X?
The only consequence of the presence of measurement errors in the dependent variables (Y) is that they inflate the standard errors of these regression coefficient estimates.
On the other hand, independent errors (X) that are present in the observations of the regressors xi = xi ∗ + ηi lead to attenuation bias in a simple univariate regression model and to inconsistent regression coefficient estimates (meaning that the parameter estimates do not tend to the true values even in very large samples) in general.
Consider a simple linear regression model of the form
where
denotes the true but unobserved regressor. Instead we
observe this value with an error:
where the measurement error
is assumed to be independent from the true value
.
If the
′s are simply regressed on the
′s (see simple linear regression), then the estimator for the slope
coefficient is
,
which converges as the sample size
increases without bound:
Variances are non-negative, so that in the limit the estimate is
smaller in magnitude than the true value of ,
an effect which statisticians call attenuation or
regression dilution.Thus the ‘naïve’ least squares estimator is
inconsistent in this setting. However, the estimator is a
consistent estimator of the parameter required for a best linear
predictor of
given
: in some applications this may be what is required, rather than an
estimate of the ‘true’ regression coefficient, although that would
assume that the variance of the errors in observing
remains fixed. This follows directly from the result quoted
immediately above, and the fact that the regression coefficient
relating the
′s to the actually observed
′s, in a simple linear regression, is given by
It is this coefficient, rather than
, that would be required for constructing a predictor of
based on an observed
which is subject to noise.
It can be argued that almost all existing data sets contain errors of different nature and magnitude, so that attenuation bias is extremely frequent (although in multivariate regression the direction of bias is ambiguous. Jerry Hausman sees this as an iron law of econometrics: "The magnitude of the estimate is usually smaller than expected.