In: Statistics and Probability
A school district undertakes an experiment to estimate the effect of class size on test scores in second-grade classes.The district assigns 20% of its previous year’s first graders to small second-grade classes and 80% to regular-size classes. Students new to the district are handled the same way : 20% are randomly assigned to small classes and 80% to regular-size classes. At the end of the second-grade school year,each student is given a standardized test. Let Y denote the test score,X denote a binary variable that equals 1 if a student is assigned to a small class, and W denote a binary variable that equals 1 if a student is newly enrolled. Let β1 denote the causal effect on test scores of reducing class size from regular to small.
1) Are X and W correlated?
2) If a researcher estimates B1 by regressing Y on X only. Do you expect her estimate to be biased/inconsistent as a result of not including W in the regression?
1) Note that regardless of whether W takes 1 or 0, the conditional distribution of X remains same (i.e takes value 1 w.p. 0.8 and value 0 w.p. 0.2). What this means is regardless of whether the student comes is newly enrolled or not, her/his chances of getting assigned to a smaller class room doesn't change. This shows that the distribution of X conditioned on W remains the same as the unconditional distribution. Hence, we conclude W and X are not correlated.
2) Yes. There is a possibility of bias/inconsistency in the estimation, if we do not include W. It might happen that a student being new enrolled has an effect on her/his marks. That is, it might happen that students who are new to district come from a different education process and that affects their test scores. Ignoring this information might include bias/inconsistency in the estimate. The researcher should drop W only if she has enough prior knowledge and evidence to believe that, a student being 'newly enrolled' doesn't have any effect on her/his performance.