In: Economics
What are OLS assumptions in time series analysis? How are they different of similar to "cross-sectional" OLS assumptions?
The time has come to introduce the OLS assumptions. In this tutorial, we divide them into 5 assumptions. You should know all of them and consider them before you perform regression analysis.
*The first one is linearity. It is called linear regression. As you may know, there are other types of regressions with more sophisticated models. The linear regression is the simplest one and assumes linearity. Each independent variable is multiplied by a coefficient and summed up to predict the value.
*The second one is endogeneity of regressors. Mathematically, this is expressed as the covariance of the error and the Xs is 0 for any error or x.
*The third OLS assumption is normality and homoscedasticity of the error term. Normality means the error term is normally distributed. The expected value of the error is 0, as we expect to have no errors on average. Homoscedasticity, in plain English, means constant variance.
*The fourth one is no autocorrelation. Mathematically, the covariance of any two error terms is 0. That’s the assumption that would usually stop you from using linear regression in your analysis.
*And the last OLS assumption is no multicollinearity. Multicollinearity is observed when two or more variables have a high correlation between each other.
Concepts of linear regression will be explored in the context of a cross-section regression of returns on a set of factors thought to capture systematic risk. Cross-sectional regressions in financial econometrics date back at least to the Capital Asset Pricing Model.
The basic model postulates that excess returns are linearly related to a set of systematic risk factors. The factors can be returns on other assets, such as the market portfolio, or any other variable related to intertemporal hedging demands, such as interest rates, shocks to inflation or consumption growth.
A linear relationship is fairly specific and, in some cases, restrictive. It is important to distinguish specifications which can be examined in the framework of a linear regression from those which cannot. Linear regressions require two key features of any model: each term on the right-hand side must have only one coefficient that enters multiplicatively and the error must enter additively.1 Most specifications satisfying these two requirements can be treated using the tools of linear regression.2 Other forms of “nonlinearities” are permissible. Any regressor or the regressand can be nonlinear transformations of the original observed data.
Double log (also known as log-log) specifications, where both the regressor and the regress- sands are log transformations of the original (positive) data, are common.
ln yi = β1 + β2 ln xi + εi .
In the parlance of a linear regression, the model is specified y˜i = β1 + β2 x˜i + εi where y˜i = ln(yi ) and x˜i = ln(xi ). The usefulness of the double log specification can be illustrated by a Cobb-Douglas production function subject to a multiplicative shock Yi = β1K β2 i L β3 i εi .
Using the production function directly, it is not obvious that, given values for output (Yi), capital (Ki) and labor (Li) of firm i, the model is consistent with linear regression. However, taking logs, ln Yi = ln β1 + β2 ln Ki + β3 ln Li + ln εi the model can be reformulated as a linear regression on the transformed data. Other forms, such as semi-log (either log-lin, where the regressand is logged but the regressors are unchanged, or lin-log the opposite) are often useful to describe certain relationships. Linear regression does, however, rule out specifications which may be of interest. Linear regression is not an appropriate framework to examine a model of the form yi = β1 x β2 1,i +β3 x β4 2,i +εi .
Fortunately, more general frameworks, such as the generalized method of moments (GMM) or maximum likelihood estimation (MLE), topics of subsequent chapters, can be applied. Two other transformations of the original data, dummy variables, and interactions can be used to generate nonlinear (in regressors) specifications. A dummy variable is a special class of regressor that takes the value 0 or 1. In finance, dummy variables (or dummies) are used to model calendar effects, leverage (where the magnitude of a coefficient depends on the sign of the regressor), or group-specific effects. Variable interactions parameterize nonlinearities into a model through products of regressors. Common interactions include powers of regressors (x 2 1,i , x 3 1,i , . . .), cross-products of regressors (x1,i x2,i) and interactions between regressors and dummy variables.
Considering the range of nonlinear transformation, linear regression is surprisingly general despite the restriction of parameter linearity. The use of nonlinear transformations also changes the interpretation of the regression coefficients. If only unmodified regressors are included, yi = xiβ + εi and ∂ yi ∂ xk,i = βk . Suppose a specification includes both xk and x 2 k as regressors, yi = β1 xi + β2 x 2 i + εi In this specification, ∂ yi ∂ xi = β1+β2 xi and the level of the variable enters its partial effect. Similarly, a simple double-log model ln yi = β1 ln xi + εi , and β1 = ∂ ln yi ∂ ln xi = ∂ y y ∂ x x = %∆y %∆x Thus, β1 corresponds to the elasticity of yi with respect to xi . In general, the coefficient on a variable in levels corresponds to the effect of a one unit change in that variable while the coefficient on a variable in logs corresponds to the effect of a one percent change. For example, in a semi-log model where the regressor is in logs but the regressand is in levels, yi = β1 ln xi + εi , β1 will correspond to a unit change in yi for a % change in xi . Finally, in the case of discrete regressors, where there is no differential interpretation of coefficients, β represents the effect of a whole unit change, such as a dummy going from 0 to 1.