In: Statistics and Probability
There are two OLS regression specification
Yi = aSexi + ui (1)
Yi = b Malei + c Femalei + ui(2)
a, b, c, are constants.
Sexi = 1 if the person is Female, and 0 otherwise. Malei = 1 if Sexi = 0, Femalei = 1 if Sexi = 1, and both are 0 otherwise.
Why are neither regression(1) nor regression(2) directly tell you if the difference in Yi between Males and Females is statistically significant?
What would be an alternative and better regression specification, if using Sexi as the only variable? Please explain in details.
The way (1) has been formulated, we see that we can only test whether the effect of Female gender is significant, compariosn with Males is not possible as for an observation corresponding to male, we are assuming that it is just a random error, since the only coefficient sex_i, becomes zero. Hence, there's no way to test the equality.
We can't test for significance in model (2) either, as we are assuming Males and Females to be two different factors. This model is nothing but sum of two independent models like (1). In Model (1), we used an indicator for whether the gender is female. If we form another model with indicator whether the gender is make, then observe that model (2) is nothing but a combination of these models and hence suffer from the same sort of disadvantages.
We can modify model (1) to be able to test the significance as follows:
Note here we have added another subscript j, to account for replication (just to be formal). What we have added to the model is a mean effect . What 'a' now represents is the ADDITIONAL effect of the female gender over the mean effect. Thus, the effect of males and females are same if this additional effect is not significant. Thus, we can test
to test the same.