In: Statistics and Probability
In R,
Draw 1000 samples from the following distributions and create histograms for each. Be sure to comment about each histogram. Remember all histograms should have four items addressed.
(a) X ∼ N (0, 1) using rnorm function.
(b) X ∼ Gamma(2, 3) using the rgamma function.
(c) X+Y where X ∼ N(5,2) and Y ∼ χ2(15).
(d) X ∼ Binomial(1, 0.3).
(e) Calculate a mean of a vector X = (X1, X2, ..., Xn ) 1000 times, where Xi ~ Binomial(1, 0.4) for n = 2, 5, 10, 20, 50, 100 and create a histogram of the means. Each n should be dealt with separately. For example, you draw 1000 samples of size n = 2 (you may use for() loop for that). Every time a sample is drawn, you calculate its mean. Once you will end up with n means, you create a histogram of those means. Then repeat for n = 5, 10, etc. There should be 6 separate histograms with comments for each. Restrict the x-axis to stay at 0 to 1 for all plots. Comment on the differences between plots as n increases.
#a)
x=rnorm(1000,0,1)
hist(x,sub="X ~ N (0,1)")
## Histogram suggests that distribution of x is symmetric
about 0. Moreover, when midpoints of each bar joined by smooth
lines,it appears like bell-shaped curve.
#b)
x=rgamma(1000,2,3)
hist(x,sub="X ~ Gamma(2,3)")
## This histogram suggests positively skewed distribution
of X.
#c)
x=rnorm(1000,mean=5,sd=sqrt(2))
y=rchisq(1000,df=15)
z=x+y
hist(z,sub="X ~ N (5,2) and Y ~ Chisquare (15)")
## Histogram suggests that distribution is NEAR to be
symmetric around z=20. [It is not exactly symmetric]
#d)
x=rbinom(1000,1,0.3)
hist(x, sub="X ~ Binomial (1,0.3)")
length(which(x==1)) ## Near to (0.3)*1000 i.e. near to 300.
length(which(x==0))
## This histogram simply shows two bars: one for '1' and
other for '0'.
### e)
n=c(2,5,10,20,50,100)
x=list()
xbar=c()
mvec=list()
## mvec will contain 6 lists each of length 1000
for(j in 1:6)
{
for(i in 1:1000)
{
x[[i]]=rbinom(n[j],1,0.4)
xbar[i]=mean(x[[i]])
}
mvec[[j]]=xbar
}
### Histograms are as below:
hist(mvec[[1]],xlim=c(0,1),sub="n=2")
##This histogram simply shows three bars.
hist(mvec[[2]],xlim=c(0,1),sub="n=5")
## This histogram shows specific number of bars, it suggests no symmetry; actually it wouldn't be reasonable to comment on the symmetry.
hist(mvec[[3]],xlim=c(0,1),sub="n=10")
## This histogram suggest, it is approaching near symmetry [See X-axis,Values spread between 0 and 0.8].
hist(mvec[[4]],xlim=c(0,1),sub="n=20")
## This histogram suggest it is approaching near symmetry [See also, the spread has reduced].
hist(mvec[[5]],xlim=c(0,1),sub="n=50")
## This histogram suggest it is approaching symmetry [See also, the spread has reduced, on X-axis--values lie between 0.2 to 0.6].
hist(mvec[[6]],xlim=c(0,1),sub="n=100")
## It can be said that this histogram suggests pure symmetry and high peakedness.
###### For
understanding it can be separately by changing values of 'j'
one-by-one. One is shown as example.
j=1
for(i in 1:1000)
{
x[[i]]=rbinom(n[j],1,0.4)
xbar[i]=mean(x[[i]])
}
mvec[[j]]=xbar
> x
[[1]]
[1] 1 0
[[2]]
[1] 1 1
[[3]]
[1] 0 0
[[4]]
[1] 1 0
[[5]]
[1] 0 0
[[6]]
[1] 0 1
> head(mvec[[1]])
[1] 0.5 1.0 0.0 0.5 0.0 0.5
> head(xbar)
[1] 0.5 1.0 0.0 0.5 0.0 0.5
##Comment on the differences between plots as n
increases.
par(mfrow=c(3,2))
hist(mvec[[1]],xlim=c(0,1),sub="n=2")
hist(mvec[[2]],xlim=c(0,1),sub="n=5")
hist(mvec[[3]],xlim=c(0,1),sub="n=10")
hist(mvec[[4]],xlim=c(0,1),sub="n=20")
hist(mvec[[5]],xlim=c(0,1),sub="n=50")
hist(mvec[[6]],xlim=c(0,1),sub="n=100")
## Clearly, we observe that as n increases, the binomial
distribution tends to Normal distribution.
## The distribution changes for skewed to completely
symmetric.
## The true center comes closer and closer to mean.
## Also, the SPREAD (variation) decreases, as n increases.