In: Economics
600,000 students in 79 countries take a PISA test every three years to evaluate their learning in reading, math, and science. Students in the Beijing-Shanghai-Jiangsu-Zhejiang regions in China scored the highest average scores in all three subjects in 2018. Canada ranked around 12th overall but has had steadily declining scores in math and science. We have data on average math scores and characteristics of 500 school districts across Canada. This problem asks you to build up a regression model to look for factors that affect average math test scores in various school districts in Canada.
I run a basic regression equation that explains math scores
(SCORE) as a function of:
1. STR = the student teacher ratio. (number of
students per teacher)
2. SPEND = government spending per student in the
school district in dollars per student.
3. HIESL= a dummy variable=1 if the percentage of
students learning English as a new language is above 20% and =0 if
it is below 20%.
The regression also includes the usual constant term.
There is a categorical variable called PROVINCE for the province of the school district. This variable =1 for Ontario, =2 for Quebec, =3 for Alberta, and =4 for British Columbia.
There is also a categorical variable called INCOME which is the average per capita income of households in the school district. INCOME = 1 if the district has average per capital income below $40000, =2 if the district has per capita income between $40000 and $60000, =3 if the district has per capita income >= $60000.
How do I add PROVINCE and INCOME into my regression model? Which
specification is correct?
SCORE = b2STR + b3SPEND + b4HIESL + b5ON + b6QC + b7BC + b8AB + b9INC1 + b10INC2 + b11INC3
where
ON = 1 if the district is in Ontario and 0 otherwise
QC = 1 if the district is in Quebec and 0 otherwise
BC = 1 if the district is in BC and 0 otherwise
AB = 1 if the district is in Alberta and 0 otherwise
INC1 = 1 if the district is the low income range and 0
otherwise
INC2 = 1 if the district is the middle income range and 0
otherwise
INC3 = 1 if the district is the high income range and 0
otherwise
SCORE = b1 +b2STR + b3SPEND + b4HIESL + b5PROVINCE + b6INCOME
where PROVINCE takes values 1 to 4
and INCOME takes values 1 to 3.
SCORE = b1 + b2STR + b3SPEND + b4HIESL + b5ON + b6QC + b7BC + b8AB + b9INC1 + b10INC2 + b11INC3
where
ON = 1 if the district is in Ontario and 0 otherwise
QC = 1 if the district is in Quebec and 0 otherwise
BC = 1 if the district is in BC and 0 otherwise
AB = 1 if the district is in Alberta and 0 otherwise
INC1 = 1 if the district is the low income range and 0
otherwise
INC2 = 1 if the district is the middle income range and 0
otherwise
INC3 = 1 if the district is the high income range and 0
otherwise
SCORE = b1 +b2STR + b3SPEND + b4HIESL + b5ON + b5QC + b6BC + b7INC2 + b8INC3
where
ON = 1 if the district is in Ontario and 0 otherwise
QC = 1 if the district is in Quebec and 0 otherwise
BC = 1 if the district is in BC and 0 otherwise
INC2 = 1 if the district is the middle income range and 0
otherwise
INC3 = 1 if the district is the high income range and 0
otherwise
SCORE = b1 +b2STR + b3SPEND + b4HIESL + b5ON + b6QC + b7BC + b8AB + b7INC2 + b8INC3
where
ON = 1 if the district is in Ontario and 0 otherwise
QC = 1 if the district is in Quebec and 0 otherwise
BC = 1 if the district is in BC and 0 otherwise
AB = 1 if the district is in Alberta and 0 otherwise
INC2 = 1 if the district is the middle income range and 0
otherwise
INC3 = 1 if the district is the high income range and 0
otherwise
It shall be noted that categorical variable called PROVINCE has 4 values:
1 for Ontario
2 for Quebec
3 for Alberta
4 for British Columbia
The variable has 3 values:
INCOME = 1 if the district has average per capital income below $40000
INCOME =2 if the district has per capita income between $40000 and $60000
INCOME =3 if the district has per capita income >= $60000
It shall be noted that regression model with the properties of classical linear regression model has an intercept.
Since, there are two categorical variable - PROVINCE with4 values and INCOME with 3 values, so, regression model would require creation of dummy variables for each value, such that PROVINCE dummies are QN, QC, BC and AB
Whereas, that for INCOME, the three dummies are INC1, INC2 and INC3
For regression equation, keep one of the values for each of the two categorical variables as base so as to avoid the problem of dummy trap and ensure re-stating the regression equation as follows:
SCORE = b1 +b2STR + b3SPEND + b4HIESL + b5ON + b5QC + b6BC + b7INC2 + b8INC3
where
ON = 1 if the district is in Ontario and 0 otherwise
QC = 1 if the district is in Quebec and 0 otherwise
BC = 1 if the district is in BC and 0 otherwise
INC2 = 1 if the district is the middle income range and 0
otherwise
INC3 = 1 if the district is the high income range and 0
otherwise
It shall be noted that in the above equation the dummy variable AB and INC1 are assumed as base variable.
Thus, the correct answer is: D.