In: Statistics and Probability
smoker |
yes |
no |
no |
yes |
no |
no |
yes |
no |
no |
no |
yes |
no |
yes |
yes |
no |
no |
yes |
yes |
no |
yes |
no |
no |
no |
yes |
yes |
yes |
no |
yes |
no |
no |
yes |
no |
yes |
no |
no |
no |
no |
yes |
no |
yes |
yes |
no |
no |
no |
no |
yes |
no |
yes |
lab scores before treatment |
4,3 |
12,4 |
4,5 |
6,7 |
7,8 |
8,9 |
5,6 |
10,3 |
6,7 |
5,4 |
6,2 |
17,9 |
6,5 |
7,8 |
12,5 |
11,2 |
5,4 |
8,7 |
7,8 |
8,9 |
6,7 |
5,4 |
4,3 |
12,4 |
5,4 |
5,2 |
14,9 |
5,4 |
6,5 |
10,4 |
9,4 |
6,5 |
10,4 |
10,4 |
9,4 |
10,7 |
12,4 |
8,0 |
13,9 |
6,7 |
5,4 |
10,4 |
9,4 |
10,7 |
10,3 |
6,7 |
11,6 |
5,6 |
time since diagnosis |
1 |
9 |
35 |
8 |
10 |
5 |
29 |
80 |
58 |
27 |
71 |
3 |
11 |
43 |
24 |
16 |
64 |
13 |
6 |
16 |
16 |
40 |
4 |
72 |
50 |
22 |
40 |
49 |
55 |
8 |
11 |
8 |
5 |
88 |
32 |
2 |
103 |
110 |
11 |
56 |
4 |
18 |
6 |
12 |
23 |
18 |
25 |
1 |
Question 3
Determine which distribution/distributions fit to the random
variables: “X: smoking
status”, “Y: lab score before treatment” and “Z: time since
diagnosis”. Explain all three
with reasons separately, and find their parameters. Provide the
explanation.
So, implement the following steps for each variable:
(i: describe the random variable. ii: write down the name of
distribution and the reason
you choose that distribution. iii: fill in the blanks, calculate
the parameter/s)
3.a
i. X is the ......
ii. I expect ... distribution fits to X because ...
iii. X ~ .......(.....)
3.b
i.Y is the.....
ii. I expect .... distribution fits to Y because ....
iii. Y ~......(....)
3.c
i. Z is the.....
ii. I expect .... distribution fits to Z because ......
iii. Z ~......(....)
i. X is smoking status i.e. total no. of smokers.
ii. I expect Binomial distribution fits to X because each subject
is either smoker or not, the somking status of a subject does not
depend on other subjects, probability of smoking is fixed for all
subjects.
iii. X ~ Binomial(48,20/48=0.4167).
3.b
i.Y is the lab score before treatment
ii. I expect normal distribution fits to Y because Y is continous
variable and from QQ plot it is observed that almost all points
falls closed to reference line.
iii. Y ~Normal(8.542, 3.067) where sample mean=8.542, sample
sd=3.067.
3c. i. Z is time since diagnosis
ii. I expect exponential distribution fits to Z because Z is continuous random variable and from QQ plot it is clear that X follows exponential distribution:
\
(iii) Z follows exponential distribution with mean=29.54 (where sample mean=29.54).