In: Economics
(a) Discuss the differences between panel data, time series data and cross sectional data.
(b) Argue with the aid of examples the advantages of panel data over cross sectional data.
(c) the Hausman specification test is used to choose between two different panel data models. Identify the two types of the models explaining their differences.
(a) Cross-sectional data refers data collected across multiple units, or cross-sections, at the same point in time. A typical example of cross-sectional data could be Census surveys, wherein data are collected on every unit of the population of a country for a given year. In general, the problems associated with regressions using a typical cross-sectional dataset are that of heteroscedasticity and clustered correlations. Heteroscedasticity refers to a specification wherein error variances are related to the values of the regressors, as specified in the equation below:
Homoscedasticity:
Heteroscedasticty:
Clustered correlations refer to a situation wherein errors are uncorrelated across spatial units of data (or clusters), but correlated within clusters. Formally, if m and n index clusters in data, then:
if ;
if .
___
Time Series data refer to data collected for a single unit over several time periods. It is essential to work with time series data in the right temporal progression. This is one key difference between cross-sectional and time series data; in time series data, the order of observations matters. A typical example of time series data would be the weekly consumer price index values for a particular country between 2010 and 2016. Typically, the problems associated with using time series datasets in regressions are serial correlations in the error term. This mostly occurs because of the errors being non-stationary, or related to a time trend. For instance, the following equation describes an auto-regressive error term, which is frequently found in time series data:
.
__
Panel data can be considered as the combination of cross-sectional and time series data; wherein data is collected for multiple units over several time periods. Conversely, both cross-sectional and time series data can be considered as special cases of panel data; the former is the case when the number of time periods in a panel equals one, and the latter when the number of units or cross-sections surveyed in a panel equals one.
(b)There are broadly two advantages of using panel data over cross-sectional data. The first is that panel data lets us use a larger number of observations. For example, a cross-sectional study done using Census data on Indian states for a single year (say 1991) will let us use 28 data points. However, if we construct a panel using data from three Census years (1991, 2001, and 2011), we get 28x3=84 observations. A higher number of observations corresponds to more precise estimates.
Secondly, panel data allow us to account for unobserved, time-invariant endogeneity. To understand this, consider the following example of a panel regression:
Here, the error term has an unobserved, time-invariant component , and a white noise component . Now, if is correlated to one of the regressors in the vector X, then using standard OLS techniques would result in an omitted variable bias, which would happen if t=1. However, with multiple time periods, this would become a panel, in which case one could use the first-differences estimator, which is demonstrated below:
.
Hence the time-invariant endogenous term is differenced out.
(c) In part b we discussed a time-invariant, individual-specific variable, denoted by . Since it was unobserved, we performed the first-difference operation to difference it out of the model. Alternatively, we can create a catch-all variable for individual-specific, time-invariant effects, by creating a dummy variable for each cross-section in the panel.
The Hausman test helps to decide the most appropriate estimation technique for this individual-specific effects variable. There are two possibilities: fixed-effects estimation and random-effects estimation. Fixed effects estimation treats as fixed for the population, whereas random effects estimation treats as a random sample drawn from a parametric distribution.