In: Math
Give an example of omitted variable bias in a multiple linear regression model. Explain how you would figure out the probable direction of the bias even without collecting data on this omitted variable. [3 marks]
Suppose you want to find out what factors determine the price of homes in your area. What could you set up to monitor all the variables? You decide to run a multiple regression to estimate the price of houses. For this, you thought of all the factors you want to include in your regression. You included variables like number of rooms in the house, the number of bathrooms, whether the house is furnished or not, and how old the house is. However, you forgot to include a very important variable – the size of the house in square feet. Your regression is likely to give you biased results. Think it over, and the reason is simple! Two houses with exactly similar values of the variables you have taken can have drastically different prices if the size of the house (or say the size of the room) is different. In missing this important variable, your regression suffers from Omitted Variable Bias.
The problem of omitted variables occurs due to misspecification of a linear regression model, which may be because either the effect of the omitted variable on the dependent variable is unknown or because the data is not available. This forces you to omit that variable from your regression, which results in over-estimating (upward bias) or under-estimating (downward) the effect of one of more other explanatory variables.
Two conditions must hold for omitted variable bias to exist.
a) The omitted variable must be correlated with the dependent variable.
b) The omitted variable must be correlated with one or more other explanatory/ independent variables.
In the example above, the size of the house in square feet is correlated with the price of the house as well as the number of rooms. Hence, omitting the size of house variable results in omitted variable bias.