In: Statistics and Probability
The Central Limit Theorem
A) Create a distribution with 50 observations of 1, 20 observations of 2, 60 observations of 3, 30 observations of 4, 30 observations of 5, and 90 observations of 6 (Hint: use the rep() command). Plot a histogram of this distribution (Hint: adjust the breaks if necessary).
B) Take 1,000 samples of size 10 from the distribution (with replacement). Calculate the mean for each sample and plot the distribution of these means.
C) Take 1,000 samples of size 30 from the distribution (with replacement). Calculate the mean for each sample and plot.
D) Compare the mean and standard deviation of the sampling distribution in (B) and (C) to the true mean of wacky distribution. Discuss how the change in sample size affects the sampling distribution of the sample mean.
The R code is pasted below.
# QUESTION A
x=c(rep(1,50),rep(2,20),rep(3,60),rep(4,30),rep(5,30),rep(6,90))
hist(x,breaks=c(1,2,3,4,5,6),main = "Histogram
(Ques.A)")
# QUESTION B
sam1 = NULL
for (i in 1:1000)
{
sam1[i] = mean(sample(x,10,replace = T))
}
hist(sam1,main = "Histogram (Ques.B)")
# QUESTION C
sam2 = NULL
for (i in 1:1000)
{
sam2[i] = mean(sample(x,30,replace = T))
}
hist(sam2,main = "Histogram (Ques.C)")
# QUESTION D
mean(sam1)
sd(sam1)
mean(sam2)
sd(sam2)
In Question D, we observe that the means of the sampling
distribution obtained in part (B) and (C) more or less remains the
same, but the standard deviation decreases as we increase the
sample size. As the sampling size increases, the mean of the
sampling distribution becomes more close to the true mean, the
variability of each sampling distribution decreases and the
distribution, if plotted, looks more like a normal
distribution.