In: Statistics and Probability
For this assignment, we will replicate the DD study by Gruber & Poterba (1994) [`a la Donald & Lang, 2007]. In their study, they estimate the impact of TRA86 on health insurance take-up of ‘self-employed’ workers. The policy intervention they use is a tax subsidy on health insurance purchase for the self-employed introduced in the Tax Reform Act of 1986. There treatment group are the self-employed workers and their control group are the other workers who are employed. In particular, they look at the aggregate insurance rates of these two groups.
Table 1: Aggregate Insurance Rates
Year Self-Employed Employed
1982 68.9 88.6
1983 72.0 88.9
1984 68.9 88.1
1985 68.6 88.0
1986 70.1 88.0
1987 76.1 86.8
1988 73.2 86.1
1989 73.5 84.5
Source: Table IV of GP1994
Question:
Write out a DD regression equation which would include all the years. Run this regression and explain the difference from Question 1. [Hint: create a dummy variable for pre- and post-TRA86]
Data for Analysis is below:
Group | Year | Insurance_Rate | Group_Dummy | Time_Dummy | Group_Time_Dummy |
Self_emp | 1982 | 68.9 | 1 | 0 | 0 |
Self_emp | 1983 | 72 | 1 | 0 | 0 |
Self_emp | 1984 | 68.9 | 1 | 0 | 0 |
Self_emp | 1985 | 68.6 | 1 | 0 | 0 |
Self_emp | 1986 | 70.1 | 1 | 0 | 0 |
Self_emp | 1987 | 76.1 | 1 | 1 | 1 |
Self_emp | 1988 | 73.2 | 1 | 1 | 1 |
Self_emp | 1989 | 73.5 | 1 | 1 | 1 |
Employed | 1982 | 88.6 | 0 | 0 | 0 |
Employed | 1983 | 88.9 | 0 | 0 | 0 |
Employed | 1984 | 88.1 | 0 | 0 | 0 |
Employed | 1985 | 88 | 0 | 0 | 0 |
Employed | 1986 | 88 | 0 | 0 | 0 |
Employed | 1987 | 86.8 | 0 | 1 | 0 |
Employed | 1988 | 86.1 | 0 | 1 | 0 |
Employed | 1989 | 84.5 | 0 | 1 | 0 |
The DD regression equation is:
y = 91.9867 - 21.2867*x1 - 1.1867*x2 + 7.0867*x3 - 0.3333*t
The time variable is included in the model. The regression analysis is:
R² | 0.987 | |||||
Adjusted R² | 0.983 | |||||
R | 0.994 | |||||
Std. Error | 1.120 | |||||
n | 16 | |||||
k | 4 | |||||
Dep. Var. | Insurance_Rate | |||||
ANOVA table | ||||||
Source | SS | df | MS | F | p-value | |
Regression | 1,072.8814 | 4 | 268.2203 | 213.67 | 2.40E-10 | |
Residual | 13.8080 | 11 | 1.2553 | |||
Total | 1,086.6894 | 15 | ||||
Regression output | confidence interval | |||||
variables | coefficients | std. error | t (df=11) | p-value | 95% lower | 95% upper |
Intercept | 91.9867 | |||||
Group_Dummy | -21.2867 | 1.9620 | -10.849 | 3.25E-07 | -25.6050 | -16.9683 |
Time_Dummy | -1.1867 | 1.2273 | -0.967 | .3544 | -3.8880 | 1.5147 |
Group_Time_Dummy | 7.0867 | 1.1571 | 6.124 | .0001 | 4.5398 | 9.6335 |
t | -0.3333 | 0.2287 | -1.458 | .1729 | -0.8367 | 0.1700 |