In: Statistics and Probability
Let’s verify what we’re seeing with logistic regression. We’ll start by making a relevant data frame in R. Type the following into the R terminal (or, easier, make an R script that contains the following code):
data(UCBAdmissions) Adm <- as.integer(UCBAdmissions)[(1:(6*2))*2-1] Rej <- as.integer(UCBAdmissions)[(1:(6*2))*2] Dept <- gl(6,2,6*2,labels=c("A","B","C","D","E","F")) Sex <- gl(2,1,6*2,labels=c("Male","Female")) Ratio <- Adm/(Rej+Adm)
berk <- data.frame(Adm,Rej,Sex,Dept,Ratio)
You can see the first several rows of the data frame by typing head(berk). Now let’s do logistic regression to try and predict probability of admission using only gender as a predictor with the following code:
LogReg.gender <- glm(cbind(Adm,Rej)~Sex,data=berk,family=binomial("logit")) summary(LogReg.gender)
Based on this output, what can we say about the admission probabilities between Males and Females?
(f) Now re-fit the logistic regression model taking both gender and department into ac- count. What happens to the coefficient for female? Give a brief summary of what we’ve shown throughout this problem.
LogReg.gender <- glm(cbind(Adm,Rej)~Sex,data=berk,family=binomial("logit")) summary(LogReg.gender)
Output:
Deviance Residuals:
Min 1Q Median 3Q Max
-16.7915 -4.7613 -0.4365 5.1025 11.2022
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.22013 0.03879 -5.675 1.38e-08 ***
SexFemale -0.61035 0.06389 -9.553 < 2e-16 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Here we can see wherever sex is female is significant for the model with an estimate of -0.61035. Females are less likely to get admitted.
(f) Now after re-fit the logistic regression model taking both gender and department into ac- count
Output:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.58205 0.06899 8.436 <2e-16 ***
SexFemale 0.09987 0.08085 1.235 0.217
DeptB -0.04340 0.10984 -0.395 0.693
DeptC -1.26260 0.10663 -11.841 <2e-16 ***
DeptD -1.29461 0.10582 -12.234 <2e-16 ***
DeptE -1.73931 0.12611 -13.792 <2e-16 ***
DeptF -3.30648 0.16998 -19.452 <2e-16 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
SexFemale and DeptC becomes insignificant. with SexFemale having estimate of 0.09987.