In: Statistics and Probability
Assume that we draw random numbers from a normal distribution
with mean μ = 5, variance σ2 = 10.
• (a) According to Central Limit Theorem what is the sample mean
and SE when we have sample size
of 10, 100, 1000 and 10000?
• (b) Experimentally justify the Central Limit Theorem using simulation given sample a size of 10, 100 and 1000 (for each given sample size you can use 50000 simulations to explore the sampling distribution of the mean )
• (c)
When sample size is 10, obtain the t scores and z scores of the sampling means (from 50000 simulations). Plot the distributions in a histogram with the theoretical Gaussian curve and t-distribution (9 degrees of freedom). For t score and z score, which of the theoretical curve fits better? Why?
For each simulation: the z score of the mean can be calculated as:
( X̄ − μ )/(σ/√ n)
where X̄ is the mean of the sample, μ is the population mean and σ is the population SD.
The t score of the mean can be alculated as:
( X̄ − μ )/(S/√ n)
where X̄ is the mean of the sample, μ is the population mean and S is the SD of the sample.
a)The Central Limit Theorem states that if are IID random variables with mean and standard deviation , then the distribution of the mean is approximately normal with mean and standard deviation as .
With and
, the mean of is 5 and sd
, the mean of is 5 and sd .
, the mean of is 5 and sd
, the mean of is 5 and sd .
b) The R code for 50000 simulations with sample size given below.
n <- 10
sims <- 50000
mu <- 5
sigma <- sqrt(10)
mean_normVec <- array(dim=c(sims))
for ( i in 1:sims)
{
normVec <- rnorm(n, mean = mu, sd = sigma)
mean_normVec[i] <- mean(normVec)
}
mean(mean_normVec)
var(mean_normVec)
The outputs are for
> mean(mean_normVec)
[1] 4.998703
> var(mean_normVec)
[1] 1.01
> mean(mean_normVec)
[1] 5.001954
> var(mean_normVec)
[1] 0.1009115
> mean(mean_normVec)
[1] 4.99969
> var(mean_normVec)
[1] 0.01007502
> mean(mean_normVec)
[1] 5.00003
> var(mean_normVec)
[1] 0.001009504
The experimental means and sd are , , , .
c) When , the R code for calculating the z and t score is given below.
n <- 10
sims <- 50000
mu <- 5
sigma <- sqrt(10)
mean_normVec <- array(dim=c(sims))
for ( i in 1:sims)
{
normVec <- rnorm(n, mean = mu, sd = sigma)
mean_normVec[i] <- mean(normVec)
}
xbar <- mean(mean_normVec)
s <- sd(mean_normVec)
z <- (xbar-mu)/(sigma/sqrt(n))
t <- (xbar-mu)/(s/sqrt(n))
z
t
The output is:
[1] 0.001943632
> t
[1] 0.00615145
The R cod for plotting the histogram and theoretical distributions is given below
n <- 10
sims <- 50000
mu <- 5
sigma <- sqrt(10)
mean_normVec <- array(dim=c(sims))
std_mean_tVec <- array(dim=c(sims))
std_mean_normVec <- array(dim=c(sims))
for ( i in 1:sims)
{
normVec <- rnorm(n, mean = mu, sd = sigma)
mean_normVec[i] <- mean(normVec)
std_mean_normVec[i] <-
(mean_normVec[i]-mu)/(sigma/sqrt(n))
std_mean_tVec[i] <-
(mean_normVec[i]-mu)/(sd(normVec)/sqrt(n))
}
xbar <- mean(mean_normVec)
s <- sd(mean_normVec)
z <- (xbar-mu)/(sigma/sqrt(n))
t <- (xbar-mu)/(s/sqrt(n))
z
t
#hist(std_mean_tVec)
hist(std_mean_tVec, density=20, breaks=20, prob=TRUE,
xlab="x", ylim=c(0, 0.5),
main="standar normal & t curve over histogram")
curve(dnorm(x, mean=0, sd=1),
col="darkblue", lwd=2, add=TRUE, yaxt="n")
curve(dt(x, df=9),
col="darkred", lwd=2, add=TRUE, yaxt="n")
The plots are: