In: Math
Suppose we have data from a health survey conducted in year 2000. Data were obtained from a random sample of 1000 persons.
An OLS linear regression analysis was carried out in the following way:
Dependent Variable: Systolic blood pressure (SBP, in mmHg)
Independent Variables: Gender (1 if female, 0 if male)
Age (in years)
Education (binary variables for “Not graduated from high school” and “Graduated from high school (but not from college)”; the reference category is “Graduated from college”)
A part of the results is shown below. The column labeled “Beta” show estimated values of partial regression coefficients. (It can be interpreted that beta’s for the reference categories, “Male” and “Graduated from college”, are fixed to be zero.) The p-values are for the two-sided test.
Variables |
Beta |
p-value |
(Constant) |
100.00 |
<0.01 |
Gender (Female) |
-3.00 |
0.04 |
Age (in years) |
0.50 |
<0.01 |
Education |
||
Not graduated from high school |
5.00 |
<0.01 |
Graduated from high school |
2.00 |
0.08 |
1. According to the results of this regression analysis, how much expected difference in systolic blood pressure (in mmHg) is estimated:
1-1. between the two education categories, “Not graduated from high school” and “Graduated from college”, controlling for gender and age (i.e., among those who have the same gender and at the same age)?
1-2. between males and females, controlling for age and education?
2. Suppose we change the reference category of education from “Graduated from college” to “Graduated from high school” and do the same regression analysis again.
What will be the value of partial regression coefficient (beta) for “Not graduated from high school”?
(Hint: The expected SBP differences among the education categories do not change.)
The estimated regression equation is
where Gender =1 is female and Gender = 0 is male
NGHS=1, when Education = "Not graduated from high school", NGHS=0 otherwise
GHS=1, when Education = "Graduated from high school", GHS=0 otherwise
When education = "Graduated from college" , NGHS=0, GHS=0 (the reference categories)
1-1. The expected systolic blood pressure when education categories is “Not graduated from high school”,
setting NGHS=1, GHS=0 we get
The expected systolic blood pressure when education categories is “graduated from College”,
setting NGHS=0, GHS=0 we get
Hence we can say that the coefficient of “Not graduated from high school”, 5 indicates the predicted value of systolic blood pressure for Not graduated from high school” is 5 (in mmHg) higher than than those who "graduated college” which is the reference category.
ans: the expected difference in systolic blood pressure between the two education categories, “Not graduated from high school” and “Graduated from college”, controlling for gender and age is 5 (in mmHg)
1-2. The coefficient of Gender is -3. Following the same logic as in 1.1, since Gender=male is the reference category, the coefficient of Gender (-3) indicates that the predicted value of systolic blood pressure for females is 3 (in mmHg) lower than that for males.
ans: the expected difference in systolic blood pressure between males and females, controlling for age and education is 3 (in mmHg)
2. Suppose we change the reference category of education from “Graduated from college” to “Graduated from high school” and do the same regression analysis again.
Let b3 be the partial regression coefficient for "Not graduated from high school" and b4 be the partial regression coefficient for “Graduated from college” and b0 the new intercept.
The new codes would be
NGHS=1, when Education = "Not graduated from high school", NGHS=0 otherwise
GC=1, When education = "Graduated from college", GC=0 otherwise
NGHS=0, GC=0 when Education = "Graduated from high school" (the reference categories)
The expected systolic blood pressure for education category “Not graduated from high school”,
setting NGHS=1, GC=0 we get
where b0 is the intercept and b3 is the estimate coefficient of NGHS and b4 is the estimated coefficient of GC
similarly, The expected systolic blood pressure for education category “graduated from College”, when the reference category is “Graduated from high school” is
setting NGHS=0, GC=1 we get
The expected systolic blood pressure for education category “Graduated from high school”, when the reference category is “Graduated from high school” is
However, from 1.1, the estimated equation for education category “Not graduated from high school” is
the estimated equation for education category “graduated from high school” is
Since these 3 equations have to be the same (the estimated SBP remains the same irrespective of the reference category)
solving these we get
ans: The value of partial regression coefficient (beta) for “Not graduated from high school” would be 3