In: Statistics and Probability
A study was conducted in a particular place,to determine the effects of vitamin A deficiency in preschool children.The investigators were particularly interested in whether children with vitamin A deficiency were at increased risk of developing respiratory infection,which normally causes death.
250 children were recruited in the study,and their age in years,gender(0=male,1=female),and whether they suffered vitamin A deficiency (0=no,1=yes) was recorded at an initial clinic visit (time 0).Also recorded was the response,whether the child was suffering from a respiratory infection (0=no,1=yes). The children then were examined again at 3 month interval for a year (at 3,6,12 and 15 months after visit) and the presence or absence of respiratory infection was recorded at each of these visits.
Column description
1.Child id
2.Response (0 or 1 as above)
3.Time (in months,as above)
4.Gender (male=0,female=1)
5.Vitamin A (not deficient =0,deficient=1)
6.age (in years)
a) Let Yi be the vector of responses for the ith child,consisting of elements Yij,the observations on whether the child has a respiratory infection at time tij (recorded in months).
Write down a model for E(Yij) in terms of an appropriate link function that is linear in an intercept and include additive terms for time,age,gender and vitamin A status.Also,write down var(Yij) given the nature of the response.
b) Under your model for E(Yij) in (a);
i) What is the probability that a female child age 4 who does not have vitamin A deficiency will not have a respiratory infection at the final visit? (Hint: give answers in terms of model parameters).
ii) What are the odds that a male child of age 3 with vitamin deficiency will have a respiratory infection at the initial visit? (Hint: give answers in terms of model parameters)
iii) What must be true if the probability of having respiratory infection is greater for children with vitamin A deficiency than for children without for any age=gender=time?.(Hint:give answers in terms of model parameters)
c) The investigators had not taken a course in longitudinal analysis,thus they were unaware that measurements on the same child might be correlated.They fit the model in (a) without taking correlation into account,treating all the observations from all children as if they were unrelated.(Use Table 1).
Table 1:Results from logistic regression (assume independent responses)
(STATA: logit RI time age female vitA)
RI | Coef | Std.Err | z | P>|z| | 95% Conf.Interval |
time | 0.168947 | 0.111999 | 1.51 | 0.131 | -0.0050567 0.0388462 |
age | -0.071598 | 0.0297275 | -2.41 | 0.016 | -0.1298636 -0.0133338 |
female | -0.5576829 | 0.114908 | -4.85 | 0.000 | -0.7828984 -0.3324674 |
vit A | 0.2869989 | 0.1175161 | 2.44 | 0.015 | 0.0566716 0.5173263 |
cons | -0.5575417 | 0.162073 | -3.44 | 0.001 | -0.8751989 -0.2398845 |
Based on this fit is there sufficient evidence to suggest that the mean pattern of respiratory response is associated with the presence or absence of vitamin A deficiency?.
State the null hypothesis corresponding to this issue in terms of your model (a),cite the test statistic and p-value on which you base this conclusion,and state your conclusion as a meaningful sentence.
d) One of the investigators then talked to a friend who knew something about repeated measurements,who suggested that the analysis in (c) may be unreliable because possible correlation had not been taken into account.Give a brief explanation of why failure to take correlation into account might be expected to lead to unreliable hypothesis tests.
e) Beacause you have taken a course in longitudinal data analysis,the investigators called you in for help with an improved analysis.Extend the model (a) to take into account correlation among repeated measurements on the same subject. (Use table 2)
Table 2: Comparisons so regression coefficients of vitanim A deficiency obtained under different assumptions of within subject within-subject correlation structure (Using GEE Method).
Vitamin A deficiency (yes versus no)
Coefficint | SE | P-value | Robust SE |
P-value (robust) |
|
Unstructured | 0.284 | 0.2189 | 0.195 | 0.2206008 | 0.198 |
Exchangable | 0.276 | 0.2207 | 0.211 | 0.2232926 | 0.217 |
AR(1) | 0.286 | 0.1804 | 0.113 | 0.2242829 | 0.202 |
f) Fit your model in (e) to the data,making as few assumptions as you can about the possible structure of correlation among the elements of data vector.
Assuming that your assumed model for correlation is correct,conduct a test of null hypothesis in part (c),citing an appropriate test statistic and p-value.State your conclusion as a meaningful sentence
Do the results agree with those in part (c)? Give a possible explanation for this,citing results from your output to support your explanation.(Use Table 3)
Table 3. GEE results with unstructured correlation matrix (with robust variance estimation)
RI | Coef. | Std.Err | Z | P>|z| | 95 % Conf.Interval |
time | 0.017177 | 0.0082279 | 2.09 | 0.037 | 0.0010506 0.0333034 |
age | -0.0771982 | 0.0542186 | -1.42 | 0.154 | -01834647 0.290683 |
female | -0.5339991 | 0.2151933 | -2.48 | 0.013 | -0.9557703 -0.112228 |
vitA | 0.2840136 | 0.2206008 | 1.29 | 0.198 | -0.148356 -0.71638033 |
cons | -0.5469366 | 0.2742036 | -1.99 | 0.046 | -1.084366 -0.0095073 |
g) From inspection of your fit in (f),do you think a simpler model for correlation may be plausable?Select a correlation model you feel is most plaisable based on your inspection,explaining why you choose this model,and fit this model to the data.(Use table 4-6)
i) Is there sufficient evidence to suggest that the probability of respiratory infection changed over the 15 month study period?
ii) Is there sufficient evidence to suggest that it is worthwhile to take gender into account in understsnding the risk of respiratory infection in thos population of children?
Table 4: Estimated within-subject correlation matrix (Unstructured)
c1 | c2 | c3 | c4 | c5 | c6 | |
r1 | 1.0000 | |||||
r2 | 0.5623 | 1.0000 | ||||
r3 | 0.4606 | 0.5757 | 1.0000 | |||
r4 | 0.4240 | 0.5629 | 0.5587 | 1.0000 | ||
r5 | 0.5035 | 0.5251 | 0.4250 | 0.4342 | 1.0000 | |
r6 | 0.4480 | 0.5636 | 0.5148 | 0.5189 | 0.5097 | 1.0000 |
Table 5: Estimated within-subject correlation matrix (Exchangeable model)
c1 | c2 | c3 | c4 | c5 | c6 | ||||||
r1 | 1.0000 | ||||||||||
r2 | 0.5060 | 1.0000 | |||||||||
r3 | 0.5060 |
|
1.0000 | ||||||||
r4 |
|
|
|
1.0000 | |||||||
r5 |
|
|
|
|
1.0000 | ||||||
r6 |
|
|
|
|
|
1.0000 |
We now estimate the coefficients based on exchangeable (uniform) model are in Table 6.
Table 6: GEE results with exchangeable correlation matrix (with ribust variance estimation)
RI | Coef | Std Err | Z | P>|z| | 95 % Conf.Interval |
time | 0.0168792 | 0.0082807 | 2.04 | 0.042 | 0.0006493 0.033109 |
age | -0.0744012 | 0.0546842 | -1.36 | 0.174 | -0.1815803 0.032778 |
female | -0.5550523 | 0.2168697 | -2.56 | 0.010 | -0.9801092 -0.1299955 |
vitA | 0.2757902 | 0.2232926 | 1.24 | 0.217 | -0.1618552 0.7134357 |
h) From your fit in (g),provide an estimate of the probability that a female child of age 7 with vitamin A deficiency has a respiratory infection at the initial visit.Given this considerations,conduct an analysis of these data.Write a brief report summarizing :
i) The statistical model you assumed,and why you choose it.
ii) The analysis you conducted,the assumptions you made and why you made them.
iii) The results, addressing the interest of the investigators as described above.
Answering first 4 parts of the question:
a) The multiple logistic regression model is given by
where P is interpreted as the probability of the dependent variable equaling a "success".
Also,
The logistic function is a sigmoid function.
b)
(i) From the above model, P can be written as
We have,
Age = 4, Vitamin A = 0, Time = 3, Gender = 0.
Therefore,
(ii)
For a continuous independent variable, the odds ratio can be defined as:
The odds that a male child of age 3 with vitamin deficiency will have a respiratory infection at the initial visit is
(iii) If the probability of having the respiratory infection is greater for children with vitamin A deficiency than for children without for any age=gender=time then we can say that a child with vitamin A deficiency is more likely to have the respiratory infection which is also given by
c) The hypothesis is as follows:
H0: vs. H1:
where i is the ith x independent variable.
z test can be used to test the above hypothesis as the sample size is high. Otherwise, the t-test can be used.
The test statistics is
if the p-value < alpha then is significant and the related independent variable has a significant effect on y i.e. Y and xi are highly correlated.
As we can see that p-value(Vitamin A) < 0.05 we can reject H0 and conclude that there is sufficient evidence to suggest that the mean pattern of respiratory response is associated with the presence or absence of vitamin A deficiency.
d) The correlation needs to be taken into account as it plays an important role in the calculation of the standard error of . Not taking covariance into account will lead to over-estimating or under-estimating of SE.
e) AR(1) is the best covariance structure for this model as the time is equidistant.