In: Math
Regarding problem R2 from chapter 26. Make a box representing a roulette wheel with 18 tickets that represent red and 20 that represent black or green. Draw 3800 times with replacement from this box and record the number of tickets drawn that are red. Repeat this process 10,000 times. What is the fraction of times (out of these 10,000 repeated trials) were the number of red tickets drawn at least 1,890? How does this compare to the P-value you got in the problem? Can you use pbinom( ) to compute this probability? Are these numbers different? Why?
"With a perfectly balanced roulette wheel, in the long run, red numbers should turn up 18 times in 38. To test its wheel, one casino records the results of 3800 plays finding 1890 reds numbers.Is that too many reds. Or chance variation?"
How do I compute this on R?
Following R-Script can be run in R console to do this simulation as:
---------------------------------------------
## Let us take a vector 'choice' that contains 18 Reds, 20 Green
or Black as
choice <- c(rep('R', 18), rep(c('G', 'B'), 10))
## Simulate the draws of a ticket from choice using sampling
with replacement with 10000 replication.
# Assign number of the replication ('repl') in the simulation. Each
column consists separate 3800 draws
repl <- 10000
draw <- as.data.frame(replicate(repl, sample(choice, size =
3800, replace = T)))
## Gather the number of 'Reds' in each column (3800 draws) in
the vector 'red.count'.
red.count <- c()
for(i in 1:repl){
red.count[i] <- sum(draw[[i]] == 'R')
}
## Fraction of cases 'f' when number of 'Reds' is at least
1890
(f <- (sum(red.count >= 1890))/repl)
---------------------------------------------
This p-value can be calculated using ‘pbinom()’ function as:
---------------------------------------------
## Using 'pbinom' function to see the probability of getting
atleast 1890 'Reds' in 3800 trials
(prob <- 1 - pbinom(1890, size = 3800, prob = 18/38))
--------------------------------------------
Output:
0.001648493
If we compare this p-value with calculated ‘f’ in the script, we can see the values will be closed by, al though little different.
Obviously, during simulation we take random numbers following some constraints and try to find probability distribution and hence the values differ from the most accurate value.
Here, every time run the script, we can see the ‘f’ value will be some times 0.00123, 0.00167, 0.0014 or sometimes 0.0023, 0.0027 etc. in 10000 number of replications in the simulation. As we increase the number of replication by 100 or 200, these ‘f’ values will become more and more consistent and closed to the p-value 0.001648 given by the ‘pbinom()’ function.
According to usage of Binomial distribution in R, following R-command can give probability to get exactly 1890 ‘Reds’ in 3800 number of trials as:
--------------------------------------------
## Probability to see exactly 1890 'Reds' in 3800 number of
trials:
dbinom(1890, size = 3800, prob = 18/38)
--------------------------------------------
Output:
0.0001810551
That means there is only 0.018% chance to see exactly 1890 Reds in 3800 numbers of plays. In other way we can say that in 10000 runs (replications) of 3800 number of plays (trials), this same thing can happen only 1.81 ~ 2 times. So, we can say that this is a chance variation.
N.B. Please DO NOT try to see the data.frame 'draw' within R console. We will not be able to understand the simulation result. Rather use the command: View(draw) to see the simulated 3800 trials in 10000 replications.
DO NOT increase the number replication 'repl' too much at a time, machine can hang/'slow down'.