In: Math
Using the language R
Generate five random data sets that follow the normal random distribution with mean 5, variance 2. The sizes of these five data sets is 10, 100, 1000, 10000 and 100000. Draw the histogram, boxplot and QQ-plot (normal probability plot) of five data sets. Please make comments about these plots
R-code
# Random data generation for normal distribution with mean 5 and
variance 2
v_mu = 5 # mean of the
normal distribution
v_sd = sqrt(2) # standard deviation
of the normal distribution (sqroot of variance)
v_mu
v_sd
par(mfrow=c(2,3))
set.seed(19561)
# dataset of size 10
n = 10
v_title = "QQ plot for size 10"
v_tithist = "Histogram for size 10"
v_titbox = "Boxplot for size 10"
d1 <- rnorm(n, v_mu, v_sd)
qqnorm(d1, main = v_title)
hist(d1, main = v_tithist)
boxplot(d1, main = v_titbox)
# dataset of size 100
n = 100
v_title = "QQ plot for size 100"
v_tithist = "Histogram for size 100"
v_titbox = "Boxplot for size 100"
d2 <- rnorm(n, v_mu, v_sd)
qqnorm(d2, main = v_title)
hist(d2, main = v_tithist)
boxplot(d2, main = v_titbox)
# dataset of size 1000
n = 1000
v_title = "QQ plot for size 1000"
v_tithist = "Histogram for size 1000"
v_titbox = "Boxplot for size 1000"
d3 <- rnorm(n, v_mu, v_sd)
qqnorm(d3, main = v_title)
hist(d3, main = v_tithist)
boxplot(d3, main = v_titbox)
# dataset of size 10000
n = 10000
v_title = "QQ plot for size 10000"
v_tithist = "Histogram for size 10000"
v_titbox = "Boxplot for size 10000"
d4 <- rnorm(n, v_mu, v_sd)
qqnorm(d4, main = v_title)
hist(d4, main = v_tithist)
boxplot(d4, main = v_titbox)
# dataset of size 100000
n = 100000
v_title = "QQ plot for size 100000"
v_tithist = "Histogram for size 100000"
v_titbox = "Boxplot for size 100000"
d5 <- rnorm(n, v_mu, v_sd)
qqnorm(d5, main = v_title)
hist(d5, main = v_tithist)
boxplot(d5, main = v_titbox)
R-output
Comment
From the above plots we can see that
As the sample size increases, the qqplots are close to a straight line
As the sample size increases, the histograms appear to be more symmetric and bell shaped
As the sample size increases, the box plots become symmetric and the mean and median divide the box symmetrically.
This proves the Central Limit Theorem that "As sample sizes increase, the sampling distributions approach a normal distribution"