In: Statistics and Probability
Questions to be completed in RStudio.
. A study involved 51 untreated adult patients with acute myeloblastic leukemia who were given a course of treatment, after which they were assessed as to their response. The variables recorded in the dataset Leukemia are:
```{r}
library(Stat2Data)
data(Leukemia)
summary(Leukemia)
```
- Age: Age at diagnosis (in years)
- Smear: Differential percentage of blasts
- Infil: Percentage of absolute marrow leukemia
infiltrate
- Index: Percentage labeling index of the bone
marrow leukemia cells
- Blasts: Absolute number of blasts, in
thousands
- Temp: Highest temperature of the patient prior
to treatment, in degrees Farenheit
- Resp: 1 = responded to treatment or 0 = failed to
respond
- Time: Survival time from diagnosis (in
months)
- Status: 0 =dead or 1 =alive
a) Fit a logistic model using Resp as the response
and Age as the predictor variable. Interpret the results and state
whether the relationship is statistically significant.
b) Set up a new variable by categorising Age into 3
groups (<30; 30-60; >60) and form a two-way table that
exhibits the nature of the relationship found in a).
c) Redo parts a) and b) using the Temp variable as the
single predictor, taking suitable cutpoints to categorise
Temp.
The first six variables (Age, Smear, Infil, Index, Blasts, Temp)
were all measured pre-treatment. Fit a multiple logistic regression
model using all six of these variables to predict Resp.
d) Based on the output from the model, which of the six
pretreatment variables appear to add to the predictive power of the
model, given that the other variable are included? (Use Wald tests
of the individual coefficients.)
e) Interpret the relationship (if any) between Age and
Resp and also between Temp and Resp indicated in this multiple
model.
f) If a predictor variable is insignificant in the
fitted model here, might it still be possible that it should be
included in a final model. Explain why or why not.
It's problematic to argue the coefficient doesn't matter. It may
simply be poorly measured.
g) Use the change in Deviance values to perform a
test to see if a model that excludes all of the nonsignificant
variables from the model in d) is a reasonable choice for the final
model. Also comment on the stability of the estimated coefficients
between the full model in d) and the reduced model without the
“nonsignificant” terms.
h) Are the estimated coefficients for Age and Temp in
your chosen model consistent with those found in a) and c)?