In: Statistics and Probability
Perform Monte Carlo integration using R statistical programming to estimate the value of π.
Generate N pairs of uniform random numbers (x,y), where x ∼ U(0,1)and y ∼ U(0,1) and each (x,y) pair represents a point in the unit square. To obtain an estimate of π count the fraction of points that fall inside the unit quarter circle and multiply by 4. Note that the fraction of points that fall inside the quarter circle should tend to the ratio between the area of the unit quarter circle (i.e 1/4π.) as compared to area of the unit square (e.i.,1) We proceed step-by-step:
a) Create a function insidecircle that takes two inputs between 0 and 1 and returns 1 if these points fall within the unit circle.
b) Create a function estimatepi that takes a
single input N, generates N pairs of uniform random numbers and
uses insidecircle to produce an estimate of π as
described above. In addition to the estimate of π.,
estimatepi should also return the standard error
of this estimate, and a 95% confidence interval for the
estimate.
c) Use estimatepi to estimate π for N = 1000 to
10000 in increments of 500 and record the estimate, its standard
error and the upper and lower bounds of the 95 % CI. How large must
N be in order to ensure that your estimate of π is within 0.1 of
the true value?
d) Using the value of N you determined in part c), run estimatepi 500 times and collect 500 different estimates of π. Produce a histogram of the estimates and note the shape of this distribution. Calculate the standard deviation of the estimates - does it match the standard error you obtained in part c)? What percentage of the estimates lies within the 95% CI you obtained in part c)?
(a):
#Estimating the value of pi insidecircle <- function(x,y) { success <- as.integer(((x^2+y^2) <= 1)) return(success) }
.
(b):
estimatepi <- function(s){ cpoint <- replicate(s, expr= { insidecircle(runif(1), runif(1)) }) est.pi <- 4*sum(cpoint)/s # er <- sd(cpoint)/sqrt(s) #This can be used too er2 <- 4*sqrt(sum((cpoint-mean(cpoint))^2))/s # multiplied 4 to get SE of pi #95% confidence interval (Z=1.96) conf.95 <- c(est.pi-1.96*er2, est.pi+1.96*er2) val <- c(est.pi, er2, conf.95) #return the values return(val) }
.
(c):
#Store results in a data frame result.frame <- as.data.frame(matrix(ncol=5, nrow=0)) n <- 1000 while(n <= 10000){ result.frame <- rbind(result.frame, c(n, estimatepi(n))) n <- n+500 } #Name the columns names(result.frame) <- c('N ', 'pi estimate', 'error ', ' 95% CI-lower', ' 95% CI-upper') #Print the results result.frame
## N pi estimate error 95% CI-lower 95% CI-upper ## 1 1000 3.176000 0.05115686 3.075733 3.276267 ## 2 1500 3.192000 0.04146594 3.110727 3.273273 ## 3 2000 3.158000 0.03646256 3.086533 3.229467 ## 4 2500 3.169600 0.03244710 3.106004 3.233196 ## 5 3000 3.068000 0.03087273 3.007489 3.128511 ## 6 3500 3.097143 0.02826546 3.041743 3.152543 ## 7 4000 3.190000 0.02541604 3.140185 3.239815 ## 8 4500 3.159111 0.02429660 3.111490 3.206732 ## 9 5000 3.134400 0.02329436 3.088743 3.180057 ## 10 5500 3.110545 0.02242843 3.066586 3.154505 ## 11 6000 3.103333 0.02153546 3.061124 3.145543 ## 12 6500 3.103385 0.02069018 3.062832 3.143937 ## 13 7000 3.132000 0.01970705 3.093374 3.170626 ## 14 7500 3.173333 0.01870219 3.136677 3.209990 ## 15 8000 3.133500 0.01842274 3.097391 3.169609 ## 16 8500 3.133647 0.01787158 3.098619 3.168675 ## 17 9000 3.114667 0.01750403 3.080359 3.148975 ## 18 9500 3.146947 0.01681012 3.114000 3.179895 ## 19 10000 3.129200 0.01650729 3.096846 3.161554
..
..
Minimum number of samples needed:
The points under the curve have a binomial distribution (Yes/No; for it to be under the curve) with a probability
p = area under the curvearea of the square = π/41 = 0.785
.
For a binomial distribution the standard error of the mean is
p(1−p)n−−−−−−−√ = 0.785(1−0.785)n−−−−−−−−−−−−−−√ = 0.411n−−√
Error for the indicator variable π
is = 160.785(1−0.785)n−−−−−−−−−−−√ = 1.643n√
This must be less than the expected error of 0.1 from the mean of 3.414.
In other words
1.643n−−√ < 0.1
⟹
n >(1.6430.1)2
or n > 270
.
Therefore let’s consider n (sample size) = 275.
.
(d):
pi.500 <- c() n1 <- 275 #sample size for (i in 1: 500){ pi.500 <- c(pi.500, estimatepi(n1)[1]) } #Plot histogram hist(pi.500, xlab='pi', main='Estimates of pi', breaks=10)
#Standard deviation sd.500 <- sd(pi.500) #Percent within 95% CI in part c #The 95% CI for 1000 repeats (closest to 270) is [3.097, 3.295] percent.95 <- (pnorm((3.395-mean(pi.500))/sd(pi.500))-pnorm((3.097-mean(pi.500))/sd(pi.500)))*100
.
Since the sample size is smaller than in part (c), the calculated standard deviation 0.0995521 is much larger than the values obtained in part (c).
Considering the 95% CI of the 1000 sample size in part c, it has an error ~ 0.4. Using this value we obtain the percentage of estimates in that interval to be 64.37 %.