In: Statistics and Probability
From the 2016 General Social Survey, when we cross-classify
political ideology
(with 1 being most liberal and 7 being most conservative) by
political party affiliation
for subjects of ages 18–27, we get:
-------------------------------------------------------------
1 2 3 4 5 6 7
Democrat 5 18 19 25 7 7 2
Republican 1 3 1 11 10 11 1
-------------------------------------------------------------
When we use R to model the effect of political ideology on the
probability of being
a Democrat, we get the results:
-------------------------------------------------------------
> y <- c(5,18,19,25,7,7,2); n <-
c(6,21,20,36,17,18,3)
> x <- c(1,2,3,4,5,6,7)
> fit <- glm(y/n ~ x, family=binomial(link=logit),
weights=n)
> summary(fit)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.1870 0.7002 4.552 5.33e-06
x -0.5901 0.1564 -3.772 0.000162
---
Null deviance: 24.7983 on 6 degrees of freedom
Residual deviance: 7.7894 on 5 degrees of freedom
Number of Fisher Scoring iterations: 4
> confint(fit)
2.5 % 97.5 %
(Intercept) 1.90180 4.66484
x -0.91587 -0.29832
-------------------------------------------------------------
a. Report the prediction equation and interpret the direction of
the estimated effect.
b. Construct the 95% Wald confidence interval for the effect of
political ideology.
Interpret and compare to the profile likelihood interval
shown.
c. Conduct the Wald test for the effect of x. Report the test
statistic, P-value, and
interpret.
d. Conduct the likelihood-ratio test for the effect of x. Report
the test statistic, find
the P-value, and interpret.
e. Explain the output about the number of Fisher scoring
iterations
a) Prediction equation:
log ( p / (1-p)) = 3.1870 - 0.5901 x
where p = predicted probability of being a Democrat
The predicted value of y = np
Coefficient of x is negative here. A person is more likely to be a Democrat if he is most liberal. There is a negative relationship between the tendency to become democrat and being conservative.
b) 95% Wald confidence interval:
{p-1.96.sqrt (p(1-p)/(N+4)), p-1.96.sqrt (p(1-p)/(N+4))}
p= predicted value of being a democrat
N= sum of n values = total number of experiments = (6 + 21 + 20 + 36 + 17 + 18 + 3) = 121
For each specific class in x, we have different predicted value of p. For each class, we have different confidence interval based on the value of p.
profile likelihood interval shown above is calculated for expected (mean) value of log odds of the probability of being a democrat.
c) Wald test in R:
wald.test(b = coef(fit), Sigma = vcov(fit), Terms = 2)
Test statistic value: 14.2
p-value: 0.00016
d) LR test in R:
lrtest(fit, "x")
Test statistics value: -13.258
p-value: 3.72e-05
e) Fisher scoring iterations is used in R to maximize the likelihood while fitting the model.It is equivalent to iteratively reweighted least squares method. Here 4 iterations are used to get the maximum likelihood estimation of coefficients while fitting Logistic regression.