In: Statistics and Probability
Problem 8.3
In 1693, Samuel Pepys, a former Member of Parliament and Secretary to the Admiralty, had to write to Isaac Newton to solve a problem that you are well-equipped to solve yourself. The problem was:
Which of the following three propositions has the greatest chance of success?
A. Six fair dice are tossed independently and at least one “6” appears.
B. Twelve fair dice are tossed independently and at least two “6”s appear.
C. Eighteen fair dice are tossed independently and at least three “6”s appear.
QUESTION BEGINS HERE
To establish a common notation for the parts below, define the following random variables:
- W: number of times 6 appears in 6 tosses of a fair 6-sided die
- X: number of times 6 appears in 12 tosses of a fair 6-sided die
- Y : number of times 6 appears in 18 tosses of a fair 6-sided die
Then we have events A = {W ≥ 1}, B = {X ≥ 2}, and C = {Y ≥ 3}.
(a) Write exact expressions for P[A], P[B], and P[C]. (You do not need to simplify each answer to a number.)
(b) Many slightly different CLT-based approximations for the three probabilities are justifiable. Write a set of CLT-based approximations where you get the same result for each of the three probabilities. (Roughly speaking, the equality of these approximations is probably related to why Pepys found the situation puzzling.)
(c) Write approximate expressions for the three probabilities using the De Moivre–Laplace formula.
(d) Evaluate the expressions in parts (a) and (c) above using a software tool (MATLAB, Python, or something similar). Include your code and results.
(e) Again using the software tool of your choice, make a plot that shows the CDFs of W, X/2, and Y/3. On the same plot, show the CDF of the Gaussian approximation for these random variables. Finally, use this plot to explain the relative ordering of your answers to parts (a) and (
(d)
P_A=1-(5/6)^6 #P(A) (Part a)
P_B=1-(5/6)^(12)-12*(1/6)*(5/6)^(11)# P(B) (Part a)
P_C=1-(5/6)^(18)-18*(1/6)*(5/6)^(17)-choose(18,2)*(1/6)^2*(5/6)^(16)
#P(C) (Part a)
P_A
P_B
P_C
Output:
> P_A
[1] 0.665102
> P_B
[1] 0.6186674
> P_C
[1] 0.5973457
Whereas using CLT, P(A)=P(B)=P(C)=0.5
(e)
R code:
w=0:6
p1=choose(6,w)*(1/6)^w*(5/6)^(6-w)
F1=0:6*0# CDF of w
F1[1]=p1[1]
for(i in 1:6)
{
F1[i+1]=F1[i]+p1[i+1]
}
F1
x=0:12
x_1=x/2 #x_1=x/2
p2=choose(12,x)*(1/6)^x*(5/6)^(12-x)
F2=0:12*0# CDF of x_1 or x
F2[1]=p2[1]
for(i in 1:12)
{
F2[i+1]=F2[i]+p2[i+1]
}
F2
y=0:18
y_1=y/3 #y_1=y/3
p3=choose(18,y)*(1/6)^y*(5/6)^(18-y)
F3=0:18*0# CDF of y_1 or y
F3[1]=p3[1]
for(i in 1:18)
{
F3[i+1]=F3[i]+p3[i+1]
}
F3
plot(w,F1,ylim=c(0,1.1),type="o",col=1,lwd=2,xlab="",ylab="CDF")
lines(x_1,F2,type="o",col=2,lwd=2)
lines(y_1,F3,type="o",col=3,lwd=2)
lines(w,pnorm(w,1,sqrt(5/6)),type="l",col=1,lty=2,lwd=2)
lines(x_1,pnorm(x_1,2,sqrt(5/3)),type="l",col=2,lty=2,lwd=2)
lines(y_1,pnorm(y_1,3,sqrt(5/2)),type="l",col=3,lty=2,lwd=2)
Here Green line corresponds to the CDF of Y, green dotted line correspond to the CDF of approximate distribution of Y;
Red line corresponds to the CDF of X, red dotted line correspond to the CDF of approximate distribution of X;
Black line corresponds to the CDF of W, black dotted line correspond to the CDF of approximate distribution of W.
Hence P(A)>P(B)>P(C).