In: Math
Using R Studio
Now, set the seed to 348 with `set.seed()`. Then take a sample of size 10,000 from a normal distribution with a mean of 82 and a standard deviation of 11.
(a) Using sum() on a logical vector, how many draws are less than 60? Using mean() on a logical vector, what proportion of the total draws is that? How far is your answer from pnorm() in 1.1 above?
```{R}
set.seed(348)
x=rnorm(10000,82,11)
sum(ifelse(x<60,1,0))
mean(ifelse(x<60,1,0))
pnorm(60,82,11)
Using sum() function there are 128 draws that are less than 60 and using the mean() function 0.0281 is the porportion of total draws. From these outputs we can say that the answer is quite close to the pnorm() value that has been calculated.
(b) What proportion of your sample is greater than 110 or less than 54?
(c) Why are your answers close to what you got above? Why are they not exactly the same?
(d) Using ggplot2, make a histogram of your sample. Set y=..density.. inside aes(). Overlay a normal distribution with stat_function(aes(samp), fun=dnorm, args=list(82,11)). Using geom_vline(xintercept=), add dashed vertical lines corresponding to the 2.5th and the 97.5th percentile of the sample
a) Using sum() on a logical vector, how many draws are less than 60? Using mean() on a logical vector, what proportion of the total draws is that? How far is your answer from pnorm() in 1.1 above?
R code with comments
#set the seed
set.seed(348)
#set the sample size
n<-10000
#take a sample of size n from normal(82,11)
x=rnorm(n,82,11)
#a
#number of draws that are less than 60
k<-sum(x<60)
sprintf('The number of draws that are less than 60 is %g',k)
# what proportion of the total draws is that?
prop<-mean(x<60)
sprintf('The proportion of the total draws that are less than 60 is
%.4f',prop)
#How far is your answer from pnorm() in 1.1
sprintf('The theoretica value from pnorm() is %.4f',
pnorm(60,82,11))
#get this
We can see that the theoretical value from pnorm is close to the sample proportion
Note: x<60 is a logical vector (made of TRUE,FALSE), where as ifelse(x<60,1,0) is a vector of 0,1s and not a logical vector
(b) What proportion of your sample is greater than 110 or less than 54?
R code
#b) proportion of your sample is greater than 110 or less than
54
prop<-mean(x<54 | x>110)
sprintf('Proportion of sample is greater than 110 or less than 54
is %.4f',prop)
# get this
(c) Why are your answers close to what you got above? Why are they not exactly the same?
R code
c)#How far is your answer from pnorm()
a<-pnorm(54,82,11)+(1-pnorm(110,82,11))
sprintf('The theoretical value from pnorm() is %.4f',a )
# get this
We can see that the sample proportion from b) is close to the theoretical proportion. They are close but not the same as the sample in part b is just that a sample which represents the population. Each sample is subjected to a random variation, and hence the proportion we calculated in part b is a sample statistics and the theoretical proportion from pnorm is the population parameter. We use the sample statistics to estimate the population parameter and hence the proportion in part b is close but not the same.
(d) Using ggplot2, make a histogram of your sample. Set y=..density.. inside aes(). Overlay a normal distribution with stat_function(aes(samp), fun=dnorm, args=list(82,11)). Using geom_vline(xintercept=), add dashed vertical lines corresponding to the 2.5th and the 97.5th percentile of the sample
R code
#d)
library(ggplot2)
#make a histogram of your sample
p<-ggplot(data.frame(x), aes(x=x))+
geom_histogram(aes(y=..density..),binwidth=1)
#Overlay a normal distribution
p<-p+stat_function(aes(x), fun=dnorm,
args=list(82,11),color="red")
#2.5th and the 97.5th percentile of the sample
q<-quantile(x,c(0.025,0.975))
p<-p+geom_vline(xintercept=q,color="blue",
linetype="dashed")
#add the title
p+labs(title="histogram of the sample")
# get this