In: Economics
(Econometrics) Imagine you have a large dataset containing 12 variables on 5000 individuals over 12 year period. Individuals are interviewed annually and their income, assets, age, race, marriage status, sex, education level, mother's education, father's education, food expenditure, religion, and annual vacation spending are recorded.
A) If you are trying to build a model predicting vacation spending what model would you build?
B) Explain where you would prefer fixed effect vs random effects model.
C) Should you worry about endogeneity? If you do what should you do?
A) Since the data is household data of 5000 households over a time period of 12 years, it is a panel dataset. Suppose, we want to predict vacation spending, we regress annual vacation spending on income, education, assets and other variables. The regression model is as shown below:
According to this model vacation spending in household i and time t is determined by the household income, education and assets. We can add other variables like age, race, marriage etc. if we think they can predict vacation spending. However, we have to be careful regarding omission of relevant variables or exclusion of irrelevant variables as it can lead to biased results.
B) Since the household specific effects may vary and are fixed for every household and the sample size of household is so large in comparison to the time period over which they are studied, it is preferable to use a household fixed effect model for the panel regression used above. Further, the relevance of fixed effect panel regression over random effect can be confirmed using the Hausman test.
C) It is possible that the problem of endogeneity occurs either due to omission of relevant variables or due to reverse causality between some of the variables. Although it is unlikely but it may be so that not just income or education affects annual vacation spending but annual vacation spending is causing income or education. This can be rectified by either using a suitable proxy for income or education or by using an instrumental variable in a two stage panel regression. If the endogeneity is due to omitted variables, it can be rectified by including them in the model, if possible.