In: Statistics and Probability
What does the Gauss-Markov theorem claim? Explain the assumptions needed for the Gauss-Markov Theorem. Are all the assumptions necessary to construct the OLS estimates of the intercept and slope coefficients.
The Gauss-Markov theorem states that if your linear regression model satisfies the first six classical assumptions, then ordinary least squares(OLS) regression produces unbiased estimates that have the smallest variance of all possible linear estimators.
Suppose we have in matrix notation,
expanding to,
where are non-random but unobservable parameters, are non-random and observable (called the "explanatory variables"), are random, and so are random. The random variables are called the "disturbance", "noise" or simply "error" (will be contrasted with "residual" later in the article; see errors and residuals in statistics). Note that to include a constant in the model above, one can choose to introduce the constant as a variable with a newly introduced last column of X being unity i.e., for all .
The Gauss–Markov assumptions concern the set of error random variables,:
A linear estimator of is a linear combination
in which the coefficients are not allowed to depend on the underlying coefficients , since those are not observable, but are allowed to depend on the values , since these data are observable. (The dependence of the coefficients on each is typically nonlinear; the estimator is linear in each and hence in each random which is why this is "linear" regression.) The estimator is said to be unbiased if and only if
regardless of the values of . Now, let be some linear combination of the coefficients. Then the mean squared errorof the corresponding estimation is
in other words it is the expectation of the square of the weighted sum (across parameters) of the differences between the estimators and the corresponding parameters to be estimated. (Since we are considering the case in which all the parameter estimates are unbiased, this mean squared error is the same as the variance of the linear combination.) The best linear unbiased estimator (BLUE) of the vector of parameters is one with the smallest mean squared error for every vector of linear combination parameters. This is equivalent to the condition that
is a positive semi-definite matrix for every other linear unbiased estimator .
The ordinary least squares estimator (OLS) is the function
of and (where denotes the transposeof ) that minimizes the sum of squares of residuals (misprediction amounts):
The theorem now states that the OLS estimator is a BLUE. The main idea of the proof is that the least-squares estimator is uncorrelated with every linear unbiased estimator of zero, i.e., with every linear combination whose coefficients do not depend upon the unobservable but whose expected value is always zero.
Assumptions of OLS Regression
The necessary OLS assumptions, which are used to derive the OLS estimators in linear regression models, are discussed below.
OLS Assumption 1: The linear regression model is “linear in parameters.”
When the dependent variable (Y) is a linear function of independent variables (X's) and the error term, the regression is linear in parameters and not necessarily linear in X's. For example, consider the following:
A1. The linear regression model is “linear in parameters.”
A2. There is a random sampling of observations.
A3. The conditional mean should be zero.
A4. There is no multi-collinearity (or perfect collinearity).
A5. Spherical errors: There is homoscedasticity and no autocorrelation
A6: Optional Assumption: Error terms should be normally distributed.
a)Y=β0+β1X1+β2X2+ε
b)Y=β0+β12X1+β2X2+ε
In the above three examples, for a) and b) OLS assumption 1 is satisfied. For c) OLS assumption 1 is not satisfied because it is not linear in parameter
OLS Assumption 2: There is a random sampling of observations
This assumption of OLS regression says that:
• The sample taken for the linear regression model must be drawn randomly from the population. For example, if you have to run a regression model to study the factors that impact the scores of students in the final exam, then you must select students randomly from the university during your data collection process, rather than adopting a convenient sampling procedure.
• The number of observations taken in the sample for making the linear regression model should be greater than the number of parameters to be estimated. This makes sense mathematically too. If a number of parameters to be estimated (unknowns) are more than the number of observations, then estimation is not possible. If a number of parameters to be estimated (unknowns) equal the number of observations, then OLS is not required. You can simply use algebra.
• The X's should be fixed (e. independent variables should impact dependent variables). It should not be the case that dependent variables impact independent variables. This is because, in regression models, the causal relationship is studied and there is not a correlation between the two variables. For example, if you run the regression with inflation as your dependent variable and unemployment as the independent variable, the OLS estimators are likely to be incorrect because with inflation and unemployment, we expect correlation rather than a causal relationship.
• The error terms are random. This makes the dependent variable random.
OLS Assumption 3: The conditional mean should be zero.
The expected value of the mean of the error terms of OLS regression should be zero given the values of independent variables.
Mathematically, E(ε∣X)=0. This is sometimes just written as E(ε)=0.
In other words, the distribution of error terms has zero mean and doesn’t depend on the independent variables X's Thus, there must be no relationship between the X's and the error term.
OLS Assumption 4: There is no multi-collinearity (or perfect collinearity).
In a simple linear regression model, there is only one independent variable and hence, by default, this assumption will hold true. However, in the case of multiple linear regression models, there are more than one independent variable. The OLS assumption of no multi-collinearity says that there should be no linear relationship between the independent variables. For example, suppose you spend your 24 hours in a day on three things – sleeping, studying, or playing. Now, if you run a regression with dependent variable as exam score/performance and independent variables as time spent sleeping, time spent studying, and time spent playing, then this assumption will not hold.
This is because there is perfect collinearity between the three independent variables.
Time spent sleeping = 24 – Time spent studying – Time spent playing.
In such a situation, it is better to drop one of the three independent variables from the linear regression model. If the relationship (correlation) between independent variables is strong (but not exactly perfect), it still causes problems in OLS estimators. Hence, this OLS assumption says that you should select independent variables that are not correlated with each other.
An important implication of this assumption of OLS regression is that there should be sufficient variation in the X's .More the variability in X's better are the OLS estimates in determining the impact of X's on Y
OLS Assumption 5: Spherical errors: There is homoscedasticity and no autocorrelation.
According to this OLS assumption, the error terms in the regression should all have the same variance.
Mathematically, Var(ε∣X)=σ2.
If this variance is not constant (i.e. dependent on X’s), then the linear regression model has heteroscedastic errors and likely to give incorrect estimates.
This OLS assumption of no autocorrelation says that the error terms of different observations should not be correlated with each other.
Mathematically, Cov(εiεj∣X)=0fori≠j
For example, when we have time series data (e.g. yearly data of unemployment), then the regression is likely to suffer from autocorrelation because unemployment next year will certainly be dependent on unemployment this year. Hence, error terms in different observations will surely be correlated with each other.
In simple terms, this OLS assumption means that the error terms should be IID (Independent and Identically Distributed).
OLS Assumption 6: Error terms should be normally distributed.
This assumption states that the errors are normally distributed, conditional upon the independent variables. This OLS assumption is not required for the validity of OLS method; however, it becomes important when one needs to define some additional finite-sample properties. Note that only the error terms need to be normally distributed. The dependent variable Y need not be normally distributed.