In: Statistics and Probability
| smoker |
| yes |
| no |
| no |
| yes |
| no |
| no |
| yes |
| no |
| no |
| no |
| yes |
| no |
| yes |
| yes |
| no |
| no |
| yes |
| yes |
| no |
| yes |
| no |
| no |
| no |
| yes |
| yes |
| yes |
| no |
| yes |
| no |
| no |
| yes |
| no |
| yes |
| no |
| no |
| no |
| no |
| yes |
| no |
| yes |
| yes |
| no |
| no |
| no |
| no |
| yes |
| no |
|
yes |
| lab scores before treatment |
| 4,3 |
| 12,4 |
| 4,5 |
| 6,7 |
| 7,8 |
| 8,9 |
| 5,6 |
| 10,3 |
| 6,7 |
| 5,4 |
| 6,2 |
| 17,9 |
| 6,5 |
| 7,8 |
| 12,5 |
| 11,2 |
| 5,4 |
| 8,7 |
| 7,8 |
| 8,9 |
| 6,7 |
| 5,4 |
| 4,3 |
| 12,4 |
| 5,4 |
| 5,2 |
| 14,9 |
| 5,4 |
| 6,5 |
| 10,4 |
| 9,4 |
| 6,5 |
| 10,4 |
| 10,4 |
| 9,4 |
| 10,7 |
| 12,4 |
| 8,0 |
| 13,9 |
| 6,7 |
| 5,4 |
| 10,4 |
| 9,4 |
| 10,7 |
| 10,3 |
| 6,7 |
| 11,6 |
|
5,6 |
| time since diagnosis |
| 1 |
| 9 |
| 35 |
| 8 |
| 10 |
| 5 |
| 29 |
| 80 |
| 58 |
| 27 |
| 71 |
| 3 |
| 11 |
| 43 |
| 24 |
| 16 |
| 64 |
| 13 |
| 6 |
| 16 |
| 16 |
| 40 |
| 4 |
| 72 |
| 50 |
| 22 |
| 40 |
| 49 |
| 55 |
| 8 |
| 11 |
| 8 |
| 5 |
| 88 |
| 32 |
| 2 |
| 103 |
| 110 |
| 11 |
| 56 |
| 4 |
| 18 |
| 6 |
| 12 |
| 23 |
| 18 |
| 25 |
| 1 |
Question 3
Determine which distribution/distributions fit to the random
variables: “X: smoking
status”, “Y: lab score before treatment” and “Z: time since
diagnosis”. Explain all three
with reasons separately, and find their parameters. Provide the
explanation.
So, implement the following steps for each variable:
(i: describe the random variable. ii: write down the name of
distribution and the reason
you choose that distribution. iii: fill in the blanks, calculate
the parameter/s)
3.a
i. X is the ......
ii. I expect ... distribution fits to X because ...
iii. X ~ .......(.....)
3.b
i.Y is the.....
ii. I expect .... distribution fits to Y because ....
iii. Y ~......(....)
3.c
i. Z is the.....
ii. I expect .... distribution fits to Z because ......
iii. Z ~......(....)
i. X is smoking status i.e. total no. of smokers.
ii. I expect Binomial distribution fits to X because each subject
is either smoker or not, the somking status of a subject does not
depend on other subjects, probability of smoking is fixed for all
subjects.
iii. X ~ Binomial(48,20/48=0.4167).
3.b
i.Y is the lab score before treatment
ii. I expect normal distribution fits to Y because Y is continous
variable and from QQ plot it is observed that almost all points
falls closed to reference line.

iii. Y ~Normal(8.542, 3.067) where sample mean=8.542, sample
sd=3.067.
3c. i. Z is time since diagnosis
ii. I expect exponential distribution fits to Z because Z is continuous random variable and from QQ plot it is clear that X follows exponential distribution:
\
(iii) Z follows exponential distribution with mean=29.54 (where sample mean=29.54).