In: Statistics and Probability
| smoker | 
| yes | 
| no | 
| no | 
| yes | 
| no | 
| no | 
| yes | 
| no | 
| no | 
| no | 
| yes | 
| no | 
| yes | 
| yes | 
| no | 
| no | 
| yes | 
| yes | 
| no | 
| yes | 
| no | 
| no | 
| no | 
| yes | 
| yes | 
| yes | 
| no | 
| yes | 
| no | 
| no | 
| yes | 
| no | 
| yes | 
| no | 
| no | 
| no | 
| no | 
| yes | 
| no | 
| yes | 
| yes | 
| no | 
| no | 
| no | 
| no | 
| yes | 
| no | 
| 
 yes  | 
| lab scores before treatment | 
| 4,3 | 
| 12,4 | 
| 4,5 | 
| 6,7 | 
| 7,8 | 
| 8,9 | 
| 5,6 | 
| 10,3 | 
| 6,7 | 
| 5,4 | 
| 6,2 | 
| 17,9 | 
| 6,5 | 
| 7,8 | 
| 12,5 | 
| 11,2 | 
| 5,4 | 
| 8,7 | 
| 7,8 | 
| 8,9 | 
| 6,7 | 
| 5,4 | 
| 4,3 | 
| 12,4 | 
| 5,4 | 
| 5,2 | 
| 14,9 | 
| 5,4 | 
| 6,5 | 
| 10,4 | 
| 9,4 | 
| 6,5 | 
| 10,4 | 
| 10,4 | 
| 9,4 | 
| 10,7 | 
| 12,4 | 
| 8,0 | 
| 13,9 | 
| 6,7 | 
| 5,4 | 
| 10,4 | 
| 9,4 | 
| 10,7 | 
| 10,3 | 
| 6,7 | 
| 11,6 | 
| 
 5,6  | 
| time since diagnosis | 
| 1 | 
| 9 | 
| 35 | 
| 8 | 
| 10 | 
| 5 | 
| 29 | 
| 80 | 
| 58 | 
| 27 | 
| 71 | 
| 3 | 
| 11 | 
| 43 | 
| 24 | 
| 16 | 
| 64 | 
| 13 | 
| 6 | 
| 16 | 
| 16 | 
| 40 | 
| 4 | 
| 72 | 
| 50 | 
| 22 | 
| 40 | 
| 49 | 
| 55 | 
| 8 | 
| 11 | 
| 8 | 
| 5 | 
| 88 | 
| 32 | 
| 2 | 
| 103 | 
| 110 | 
| 11 | 
| 56 | 
| 4 | 
| 18 | 
| 6 | 
| 12 | 
| 23 | 
| 18 | 
| 25 | 
| 1 | 
Question 3
Determine which distribution/distributions fit to the random
variables: “X: smoking
status”, “Y: lab score before treatment” and “Z: time since
diagnosis”. Explain all three
with reasons separately, and find their parameters. Provide the
explanation.
So, implement the following steps for each variable:
(i: describe the random variable. ii: write down the name of
distribution and the reason
you choose that distribution. iii: fill in the blanks, calculate
the parameter/s)
3.a
i. X is the ......
ii. I expect ... distribution fits to X because ...
iii. X ~ .......(.....)
3.b
i.Y is the.....
ii. I expect .... distribution fits to Y because ....
iii. Y ~......(....)
3.c
i. Z is the.....
ii. I expect .... distribution fits to Z because ......
iii. Z ~......(....)
i. X is smoking status i.e. total no. of smokers.
ii. I expect Binomial distribution fits to X because each subject
is either smoker or not, the somking status of a subject does not
depend on other subjects, probability of smoking is fixed for all
subjects.
iii. X ~ Binomial(48,20/48=0.4167).
3.b
i.Y is the lab score before treatment
ii. I expect normal distribution fits to Y because Y is continous
variable and from QQ plot it is observed that almost all points
falls closed to reference line.

iii. Y ~Normal(8.542, 3.067) where sample mean=8.542, sample
sd=3.067.
3c. i. Z is time since diagnosis
ii. I expect exponential distribution fits to Z because Z is continuous random variable and from QQ plot it is clear that X follows exponential distribution:
\
(iii) Z follows exponential distribution with mean=29.54 (where sample mean=29.54).