In: Statistics and Probability
The purpose of this is to demonstrate the sampling distribution of the sample mean when the data is generated from an exponential distribution (at rate 2.0) as sample size increases. Describe and compare the sampling distributions of the sample mean for sample sizes n=5, n=20, n=40. Do this by simulating samples at least 5000 times. R-code for generating a sample of size n=10 from exponential distribution at rate 2.0 is
>rexp(n=10, rate=2.0)
R pROGRAM
nsim=5000
x1=numeric(nsim)
x2=numeric(nsim)
x3=numeric(nsim)
for( i in 1:nsim)
{
x1[i]=mean(rexp(n=5,rate=2))
x2[i]=mean(rexp(n=20,rate=2))
x3[i]=mean(rexp(n=40,rate=2))
}
par(mfrow=c(1,3))
hist(x1,prob=TRUE,ylim=c(0,5),xlab="",ylab="",main="Histogram for
n=5")
hist(x2,prob=TRUE,ylim=c(0,5),xlab="",ylab="",main="Histogram for
n=20")
hist(x3,prob=TRUE,ylim=c(0,5),xlab="",ylab="",main="Histogram for
n=40")
For the purpose of comparison, we plot the histogram of the 5000 sample means thus generated for each sample size.
We find that as we increase n, the height of the histogram increases with decrease in the spread. This is expected because as we increase n, the sample mean becomes closer to the population mean 1/2 . Hence as we increase sample size , the spread reduces and height increases. For further large n, it is expected to have a histogram with a bar of huge height and virtually zero spread.