Question

In: Statistics and Probability

This problem involves using R to examine the Central Limit Theorem more in detail. For all...

This problem involves using R to examine the Central Limit Theorem more in detail. For all answers in this problem, round to four decimal places.

We will first generate 10 Poisson(λ=1) random variables and then calculate the sample mean of these 10 random variables. We will do this process 10,000 times to generate 10,000 simulated sample means. Run the following code and use the output to answer the following questions.

set.seed(2020)

nsims = 10000 # number of simulations
means = rep(0,nsims) # vector to store sample mean from each simulation

for(i in 1:nsims){
data = rpois(n=10,lambda=1)
means[i] = mean(data)
}

hist(means)
mean(means)
sd(means)


(d) From theory, what is SE[X] for this example? (This is a hand calculation not using R.)

(e) What proportion of the simulated sample means are less than 0.9? (If you don't remember how to do this operation in R, look back at your previous homework assignments.)

(f) Using the Central Limit Theorem, calculate the (approximate) probability that P(X < 0.9) for n=10 and λ=1. (This is a hand calculation not using R.)


Now, we want to explore how sample size has an impact on the distribution of the sample mean. Using the code above, adjust the sample size to run simulations for n=1, n=5, and n=100.



(i) Suppose we want a theoretical standard error of 0.25. What should be our sample size? (This is a hand calculation not using R.)

(j) Using the provided code above and the sample size you found in (i) and use R to complete 10,000 simulations. What is the standard deviation of the simulated sample means?

(k) Which of the following are true? (Check all that apply)

Even if our data is not normally distributed, the distribution of sample means will be approximately normally distributed for a sufficiently large sample size.

The sample mean converges to the population mean, as the sample size increases.

As the sample size increases, the standard errors tend to increase.

The Central Limit Theorem only applies to data that is normally distributed to begin with.

Solutions

Expert Solution

Run the R code to get the following

(d) From theory, what is SE[X] for this example? (This is a hand calculation not using R.)

X has a Poisson distribution with parameter . We know that

The expected value of X is

The standard deviation of X is

Let be the sample mean of a randomly selected sample of size n=10 from X.

The standard error of mean is

ans: From the theory, the standard error of mean is 0.3162

(e) What proportion of the simulated sample means are less than 0.9? (If you don't remember how to do this operation in R, look back at your previous homework assignments.)

Modified R code

-----------

set.seed(2020)

nsims = 10000 # number of simulations
means = rep(0,nsims) # vector to store sample mean from each simulation

for(i in 1:nsims){
data = rpois(n=10,lambda=1)
means[i] = mean(data)
}

#Calculate the proportion of sample means less than 0.9
sum(means<0.9)/nsims

hist(means)
mean(means)
sd(means)

--------

Get this

ans: The proportion of the simulated sample means less than 0.9 is 0.3397


(f) Using the Central Limit Theorem, calculate the (approximate) probability that P(X < 0.9) for n=10 and λ=1. (This is a hand calculation not using R.)

We know the population standard deviation of X to be 1. Hence although the sample size n is less than 30, we can apply the central limit theorem. Using the central limit theorem, we can say that the distribution of is normal with mean and standard devation (also called standard error of mean) for a sample of size n=10

The required probability is

ans: Using the Central Limit Theorem, calculate the (approximate) probability that for n=10 and λ=1 is 0.3745

Now, we want to explore how sample size has an impact on the distribution of the sample mean. Using the code above, adjust the sample size to run simulations for n=1, n=5, and n=100.

R code for n=1

---

set.seed(2020)

nsims = 10000 # number of simulations
means = rep(0,nsims) # vector to store sample mean from each simulation

for(i in 1:nsims){
data = rpois(n=1,lambda=1)
means[i] = mean(data)
}

hist(means)
mean(means)
sd(means)

----

get this

R code for n=5

---

set.seed(2020)

nsims = 10000 # number of simulations
means = rep(0,nsims) # vector to store sample mean from each simulation

for(i in 1:nsims){
data = rpois(n=5,lambda=1)
means[i] = mean(data)
}

hist(means)
mean(means)
sd(means)

----

Get this

R code for N=100

---

set.seed(2020)

nsims = 10000 # number of simulations
means = rep(0,nsims) # vector to store sample mean from each simulation

for(i in 1:nsims){
data = rpois(n=100,lambda=1)
means[i] = mean(data)
}

hist(means)
mean(means)
sd(means)

---

Get this

(i) Suppose we want a theoretical standard error of 0.25. What should be our sample size? (This is a hand calculation not using R.)

Let n be the sample size which gives the standard error of mean to be 0.25

ans: The sample size needed to get a theoretical standard error of 0.25 is 16


(j) Using the provided code above and the sample size you found in (i) and use R to complete 10,000 simulations. What is the standard deviation of the simulated sample means?

R code

---

set.seed(2020)

nsims = 10000 # number of simulations
means = rep(0,nsims) # vector to store sample mean from each simulation

for(i in 1:nsims){
data = rpois(n=16,lambda=1)
means[i] = mean(data)
}

hist(means)
mean(means)
sd(means)

---

Get this

ans: the standard deviation of the simulated sample means is 0.25, which is close to the theoretical standard error calculated in i


(k) Which of the following are true? (Check all that apply)

ans:

  • Even if our data is not normally distributed, the distribution of sample means will be approximately normally distributed for a sufficiently large sample size.
  • The sample mean converges to the population mean, as the sample size increases.

Note:

As the sample size increases, the standard errors tend to decrease and not increase.

False: The Central Limit Theorem only applies to data that is normally distributed to begin with.


Related Solutions

This week we’ve introduced the central limit theorem. According to the central limit theorem, for all...
This week we’ve introduced the central limit theorem. According to the central limit theorem, for all samples of the same size n with n>30, the sampling distribution of x can be approximated by a normal distribution. In your initial post use your own words to explain what this theorem means. Then provide a quick example to explain how this theorem might apply in real life. At last, please share with us your thoughts about why this theorem is important.
Demonstrate understanding of the Central Limit Theorem, using R, by showing how the distribution of the...
Demonstrate understanding of the Central Limit Theorem, using R, by showing how the distribution of the sample mean changes according to sample size. Consider a Poisson distribution with λ = 1.5. Generate samples of 10,000 means over different numbers of observations (eg give a matrix 1, 2,3...100) rows. For each of these samples of means, compute the mean of the means, the sample standard deviation of the means, and the proportions of means that are more than 1 standard deviation...
It is said that the Central Limit Theorem is the most important theorem in all of...
It is said that the Central Limit Theorem is the most important theorem in all of Statistics. In your own words, describe why it is so important.
This is the code in R for the Central Limit theorem question. This is Exponential distribution...
This is the code in R for the Central Limit theorem question. This is Exponential distribution with mean beta. How can I modify this code to Bernoulli(0.1), Uniform (0,4), and Normal distribution (2,1)    plot.z <- function(n, m=1e5, beta = 1) { mu <- beta sigma <- beta zs <- rep(0,m) for(i in 1:m) { Y.sample <- rexp(n, 1/beta) Ybar <- mean(Y.sample) zs[i] <- (Ybar - mu) / (sigma / sqrt(n)) } p <- hist(zs, xlim=c(-4,4), freq = F, main =...
Evaluate some background research on the Central Limit Theorem. Then discuss in scholarly detail using examples...
Evaluate some background research on the Central Limit Theorem. Then discuss in scholarly detail using examples researched or based on life experiences how the Central Limit Theorem is used to answer questions about a sample population where data collected does not potentially fall under a normal bell curve?   
describe what the central limit theorem is, and list the requirements necessary for using the Central...
describe what the central limit theorem is, and list the requirements necessary for using the Central Limit Theorem.
3.27. Problem. (Section 11.5) The following are applications of Theorem 11.6 or the Central Limit Theorem....
3.27. Problem. (Section 11.5) The following are applications of Theorem 11.6 or the Central Limit Theorem. (a) Determine the distribution of (1/5)X1 + (2 /5)X2 + (2/5)X3 if X1, X2 and X3 are independent normal distributions with µ = 2 and σ = 3. (b) The weight (kg) of a StarBrite watermelon harvested under certain environmental conditions is normally distributed with a mean of 8.0 with standard deviation of 1.9. Suppose 24 StarBrite watermelons grown in these conditions are harvested;...
Applying the Central Limit Theorem in R Please click on the link above to participate in...
Applying the Central Limit Theorem in R Please click on the link above to participate in this week's discussion. We can design an experiment to see how the Central Limit Theorem applies practically. Execute the following commands in R: Initialize the variables we will use: > mu <- 100 > sigma <- 10 > n <- 5 > numSims <- 500 > xbar <- rep(0,numSims) mu is the mean, sigma is the standard deviation of the normal distribution we will...
In this problem, you are going to numerically verify that the Central Limit Theorem is valid...
In this problem, you are going to numerically verify that the Central Limit Theorem is valid even when sampling from non-normal distributions. Suppose that a component has a probability of failure described by a Weibull distribution. Let X be the random variable that denotes time until failure; its probability density is: f X(x; γ, k) = (k/γ)(x/γ)k−1 e −(x/γ)k , for x ≥ 0, and zero elsewhere. In this problem, assume k = 2, γ = 125 [hours]. a) Simulate...
For each problem identify the sampling distribution explicitly using the Central Limit Theorem, use probability notation...
For each problem identify the sampling distribution explicitly using the Central Limit Theorem, use probability notation to indicate what you are being asked to find, find your probabilities showing all your work, including diagrams. Interpret your final results. Submit your completed homework to Canvas. A 2012 survey of adults 18 years and older reported that 34% have texted while driving. A random sample of 125 adults was selected.    What is the probability that 36 or more people in this...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT