In: Statistics and Probability
This problem involves using R to examine the Central Limit
Theorem more in detail. For all answers in this problem, round to
four decimal places.
We will first generate 10 Poisson(λ=1) random variables and then
calculate the sample mean of these 10 random variables. We will do
this process 10,000 times to generate 10,000 simulated sample
means. Run the following code and use the output to answer the
following questions.
set.seed(2020)
nsims = 10000 # number of simulations
means = rep(0,nsims) # vector to store sample mean from each
simulation
for(i in 1:nsims){
data = rpois(n=10,lambda=1)
means[i] = mean(data)
}
hist(means)
mean(means)
sd(means)
(d) From theory, what is SE[X] for this example? (This is a hand
calculation not using R.)
(e) What proportion of the simulated sample means are less than
0.9? (If you don't remember how to do this operation in R, look
back at your previous homework assignments.)
(f) Using the Central Limit Theorem, calculate the (approximate)
probability that P(X < 0.9) for n=10 and λ=1. (This is a hand
calculation not using R.)
Now, we want to explore how sample size has an impact on the
distribution of the sample mean. Using the code above, adjust the
sample size to run simulations for n=1, n=5, and n=100.
(i) Suppose we want a theoretical standard error of 0.25. What
should be our sample size? (This is a hand calculation not using
R.)
(j) Using the provided code above and the sample size you found in
(i) and use R to complete 10,000 simulations. What is the standard
deviation of the simulated sample means?
(k) Which of the following are true? (Check all that apply)
Even if our data is not normally distributed, the distribution of sample means will be approximately normally distributed for a sufficiently large sample size.
The sample mean converges to the population mean, as the sample size increases.
As the sample size increases, the standard errors tend to increase.
The Central Limit Theorem only applies to data that is normally distributed to begin with.
Run the R code to get the following
(d) From theory, what is SE[X] for this example? (This is a hand calculation not using R.)
X has a Poisson distribution with parameter . We know that
The expected value of X is
The standard deviation of X is
Let be the sample mean of a randomly selected sample of size n=10 from X.
The standard error of mean is
ans: From the theory, the standard error of mean is 0.3162
(e) What proportion of the simulated sample means are less than 0.9? (If you don't remember how to do this operation in R, look back at your previous homework assignments.)
Modified R code
-----------
set.seed(2020)
nsims = 10000 # number of simulations
means = rep(0,nsims) # vector to store sample mean from each
simulation
for(i in 1:nsims){
data = rpois(n=10,lambda=1)
means[i] = mean(data)
}
#Calculate the proportion of sample means less than
0.9
sum(means<0.9)/nsims
hist(means)
mean(means)
sd(means)
--------
Get this
ans: The proportion of the simulated sample means less than 0.9 is 0.3397
(f) Using the Central Limit Theorem, calculate the (approximate)
probability that P(X < 0.9) for n=10 and λ=1. (This is a hand
calculation not using R.)
We know the population standard deviation of X to be 1. Hence although the sample size n is less than 30, we can apply the central limit theorem. Using the central limit theorem, we can say that the distribution of is normal with mean and standard devation (also called standard error of mean) for a sample of size n=10
The required probability is
ans: Using the Central Limit Theorem, calculate the (approximate) probability that for n=10 and λ=1 is 0.3745
Now, we want to explore how sample size has an impact on the distribution of the sample mean. Using the code above, adjust the sample size to run simulations for n=1, n=5, and n=100.
R code for n=1
---
set.seed(2020)
nsims = 10000 # number of simulations
means = rep(0,nsims) # vector to store sample mean from each
simulation
for(i in 1:nsims){
data = rpois(n=1,lambda=1)
means[i] = mean(data)
}
hist(means)
mean(means)
sd(means)
----
get this
R code for n=5
---
set.seed(2020)
nsims = 10000 # number of simulations
means = rep(0,nsims) # vector to store sample mean from each
simulation
for(i in 1:nsims){
data = rpois(n=5,lambda=1)
means[i] = mean(data)
}
hist(means)
mean(means)
sd(means)
----
Get this
R code for N=100
---
set.seed(2020)
nsims = 10000 # number of simulations
means = rep(0,nsims) # vector to store sample mean from each
simulation
for(i in 1:nsims){
data = rpois(n=100,lambda=1)
means[i] = mean(data)
}
hist(means)
mean(means)
sd(means)
---
Get this
(i) Suppose we want a theoretical standard error of 0.25. What should be our sample size? (This is a hand calculation not using R.)
Let n be the sample size which gives the standard error of mean to be 0.25
ans: The sample size needed to get a theoretical standard error of 0.25 is 16
(j) Using the provided code above and the sample size you found in
(i) and use R to complete 10,000 simulations. What is the standard
deviation of the simulated sample means?
R code
---
set.seed(2020)
nsims = 10000 # number of simulations
means = rep(0,nsims) # vector to store sample mean from each
simulation
for(i in 1:nsims){
data = rpois(n=16,lambda=1)
means[i] = mean(data)
}
hist(means)
mean(means)
sd(means)
---
Get this
ans: the standard deviation of the simulated sample means is 0.25, which is close to the theoretical standard error calculated in i
(k) Which of the following are true? (Check all that apply)
ans:
Note:
As the sample size increases, the standard errors tend to decrease and not increase.
False: The Central Limit Theorem only applies to data that is normally distributed to begin with.