In: Statistics and Probability
The virus (SARS-CoV-2) that causes Coronavirus Disease 2019 (COVID-19) spreads easily enough to have caused a global pandemic hard to contain so far. Effective containment needs to be informed by some key epidemiological factors.
One such factor is called the “serial interval” and it represents the “duration between symptom onsets of successive cases in a transmission chain.” That is, if person A (infector) transmits the virus to person B (infectee), then the serial interval is the amount of time between when person A first showed symptoms and when person B first showed symptoms. Because some people start showing symptoms much faster than others, it is actually possible for the serial interval to be a negative value.
A research paper published in May 2020 examined a set of 77 infector-infectee transmission pairs with lab-confirmed COVID-19 diagnoses and a well-documented transmission event. The serial interval for each pair is listed below. The data can reasonably be considered a random sample of COVID-19 transmission.
Use statistical software for your analysis of the data.
Serial_Interval
3
6
3
7
3
3
3
4
0
12
16
4
2
4
2
2
2
2
0
14
18
12
1
6
18
5.5
3
8.5
6
6
17
1
11
4
16
7
5
7
9
2
3
2
0
2
9
3
8
5
11
3
9.5
8
14
-3
9
4
4
7
7
5
5
5
2.5
1
7
5.5
2
-4
7
4
8
8
3
5.5
5
4
12
a. Assuming that the conditions for inference are met, interpret your confidence interval in a sentence in context.
b. People's memory of symptom onset and past encounters might not always be completely accurate or even perfectly truthful. This potential issue is called ["voluntary response bias", "nonresponse bias", "undercoverage bias", "response bias"] , and it ["is not", "may be", "is"] accounted for by the reported margin of error because the margin of error ["quantifies any source of variation that can occur in random samples", "quantifies only variation in random samples due to the sampling process", "quantifies any source of variation that can occur in samples"]
c. What are the conditions for this inference procedure?
One random sample AND normal data or large enough sample size.
Independent random samples AND normal data or large enough total sample size AND similar sample standard deviations.
Independent random samples AND normal data AND large enough total sample size AND similar sample standard deviations.
Independent random samples AND normal data or large enough total sample size.
Independent random samples AND normal data AND large enough total sample size.
One random sample AND normal data AND large enough sample size.
d. The conditions for this inference procedure
may be barely met (at the limit of the robustness of this procedure)
are all met.
are not completely met (one condition or more isn't met).
are not met at all.
f. Obtain a 95% confidence interval for the corresponding parameter. The value of the margin of error for this interval is
g. The distribution of serial intervals in the sample is ["left-skewed", "roughly symmetric", "right-skewed", "clearly bimodal"] with ["no", "1", "5"] extreme outlier(s), and values ranging from ["-4", "12", "0", "6", "-2", "-8"] to ["18", "8", "24", "12", "77"]
Sol:
c. What are the conditions for this inference procedure?
here n=77 ,large sample size follows normal distribution
ANSWER(C)
Independent random samples AND normal data or large enough total sample size AND similar sample standard deviations.
d. The conditions for this inference procedure
ANSWER(D)
are all met.
f. Obtain a 95% confidence interval for the corresponding parameter. The value of the margin of error for this interval is
Rcode;
df2 =read.table(header = TRUE, text ="
Serial_Interval
3
6
3
7
3
3
3
4
0
12
16
4
2
4
2
2
2
2
0
14
18
12
1
6
18
5.5
3
8.5
6
6
17
1
11
4
16
7
5
7
9
2
3
2
0
2
9
3
8
5
11
3
9.5
8
14
-3
9
4
4
7
7
5
5
5
2.5
1
7
5.5
2
-4
7
4
8
8
3
5.5
5
4
12
"
)
df2
t.test(df2$Serial_Interval)
Output;
One Sample t-test
data: df2$Serial_Interval
t = 11.046, df = 76, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
4.747806 6.836609
sample estimates:
mean of x
5.792208
mean(df2$Serial_Interval)
sd(df2$Serial_Interval)
length(df2$Serial_Interval)
hist(df2$Serial_Interva)
Outliervalues= boxplot(df2$Serial_Interval)$out
mean(df2$Serial_Interval)
[1] 5.792208
> sd(df2$Serial_Interval)
[1] 4.601452
> length(df2$Serial_Interval)
[1] 77
>
The value of the margin of error for this interval is
=T.INV(0.025,76)
=1.9916726
margin of interval
=t*s/sqrt(n)
=1.99167261*4.601452/sqrt(77)
=1.044401
ANSWER(f)
95% confidence interval for mean 4.747806 and 6.836609
Margin of interval=1.044401
ANSWER(g)
To get histogram in R
hist(df2$Serial_Interval)
fivenum summary from boxplot is
min=-4
Q=3
Q2= 5
Q3=8
max=18
There are 5 outliers namely
16 18 18 17 16
ANSWER(G)
g. The distribution of serial intervals in the sample is right-skewed with ["5"] extreme outlier(s), and values ranging from -4 and 18
serial_interval 15 10 T Frequency LO o 0 5 10 15 serial_interval
We were unable to transcribe this image