In: Statistics and Probability
Using R Studio/R programming...
Run your own experiment and use your results as a supporting argument for your response.
How might I run A and B? What is the format? I don't particularly need the answer, I can determine that on my own. I just need help with the "set up". Thanks!
Let us assume that we have access to a population which is normally distributed.
So we create a population with large number of entries say 100,000 so that drawing a sample of 500 from it without replacement will not become a finite population analysis (That is sample size is not more than 5% of the population.)
We will assume that we do not know the population standard deviation (and of course the mean, which we are trying to estimate). So we do not know the distribution of the population.
a) Next we draw one random sample of size 500 (without replacement, we are not doing a bootstrapping estimation) and get the sample mean and sample standard deviation s. Since the sample size n=500 is greater than 30, we can use the central limit theorem to say that the sample mean is normally distributed.
We can say with 95% confidence that the population mean lies with in
b) Next we draw 300 random samples of size 10 each and estimate 300 sample means. Now we can get a 95% percentile interval of the mean.
We can then say that we are 95% confident that the population mean lies between 2.5 percentile and 97.5 percentile
Finally we see which of the interval to estimate the population mean is narrower.
R code with comments
----
set.seed(123)
#set the size of the population
N<-100000
#Draw a population of N entries from standard normal(0,1)
pop<-rnorm(N)
#we assume that the population mean and sd are not known.
#part a)
#get 1 sampe of size 500 from the population (without
replacement)
x<-sample(pop,size=500)
#get the right tail critical value for 95% confidence
zc<-qnorm(0.975)
#set the standard error of mean
se<-sd(x)/sqrt(500)
#calculate the confidence interval
lci<-mean(x)-se*zc
uci<-mean(x)+se*zc
sprintf('We are 95%% confident that the population mean lies in the
interval [%.4f,%.4f]',lci,uci)
#part b)
#get 300 samples of size 10
x<-sample(pop,size=300*10)
#format x into a matrix of 300x10
x<-matrix(x,nrow=300)
#calculate 300 sample means
xbar<-apply(x,1,mean)
#get the percentile intervals
ci<-quantile(xbar,c(0.025,0.975))
sprintf('We are 95%% confident that the population mean lies in the
interval [%.4f,%.4f]',ci[1],ci[2])
----
get this
we can see that using a sample of size 500, we are able to estimate the population mean to be between -0.1 and 0.08 in comparison to our estimate of population mean to be between -0.59 and 0.56 when using 300 samples of size 10 (with 95% confidence)
Using a sample size of 500 gives us a narrower interval for the population mean, making it the sampling plan of choice