In: Statistics and Probability
In this activity, you are going to look at the sampling distribution and how it depends on the size of the sample. This will be done by simulating a sample drawn from a population with known properties. In particular you'll be looking at a variable that is more or less like the distribution of human adult heights - normally distributed with a mean of 68 inches and a standard deviation of 3 inches. Here's one random sample of size n = 10 from this simulated population:
> rnorm(10, mean=68, sd=3)
[1] 62.842 71.095 62.357 68.896 67.494
[6] 67.233 69.865 71.664 69.241 70.581
These are the heights of a random sample of n = 10. The sampling distribution refers to some numerical description of such data, for example, the sample mean. Consider this sample mean the output of a single trial.
> mean( rnorm(10, mean=68, sd=3) )
[1] 67.977
If you gave exactly this statement, it's very likely that your result was different. That's because you have a different random sample -- rnorm generates random numbers. And if you repeat the statement, you'll likely get a different value again, for instance:
> mean( rnorm(10, mean=68, sd=3) )
[1] 66.098
Note that both of the sample means above differ somewhat from the population mean of 68. The point of examining a sampling distribution is to be able to see the reliability of a random sample. Do to this, you generate many trials, say 1000, and look at the distribution of the trials. For example, here's how to look at the sampling distribution for the mean of 10 random cases from the population:
> s = do(1000)*mean( rnorm(10, mean=68, sd=3) )
By examining the distribution of the values stored in s, you can see what the sampling distribution looks like. Generate your own sample
1 What is the mean of this distribution?
2 What is the standard deviation of this distribution?
3 What is the shape of this distribution?
Now modify your simulation to look at the sampling distribution for n = 1000.
4 What is the mean of this distribution?
5 What is the standard deviation of this distribution?
6 What is the shape of this distribution?
Which of these two sample sizes, n = 10 or n = 1000, gave a sampling distribution that was more reliable? How might you measure the reliability?
> ### Required function
>
> Function=function(n, nsim)
+ {
+ M=array()
+
+ for(i in 1:nsim)
+ {
+
+ M[i]=mean( rnorm(n, mean=68, sd=3))
+ }
+ return(M)
+ }
>
>
> ##### For n=10
> Sample_1=Function_1(n=10, nsim=1000)
>
> # 1 What is the mean of this distribution?
> mean(Sample_1)
[1] 68.02651
>
> # 2 What is the standard deviation of this distribution?
> sd(Sample_1)
[1] 0.9534487
>
> # 3 What is the shape of this distribution?
> hist(Sample_1)
> Sample_1=Function_1(n=10, nsim=1000)
>
> # 1 What is the mean of this distribution?
> mean(Sample_1)
[1] 68.02284
>
> # 2 What is the standard deviation of this distribution?
> sd(Sample_1)
[1] 0.9428393
>
> # 3 What is the shape of this distribution?
> hist(Sample_1)
# Shape of the distribution is symmetric.
>
> ############################
> ##### For n=1000
> Sample_2=Function_1(n=1000, nsim=1000)
>
> #4 What is the mean of this distribution?
> mean(Sample_2)
[1] 68.00272
>
> #5 What is the standard deviation of this distribution?
> sd(Sample_2)
[1] 0.09457919
>
> #6 What is the shape of this distribution?
> hist(Sample_2)
# Shape of the distribution is symmetric.
# Which of these two sample sizes, n = 10 or n = 1000, gave a sampling distribution that was more reliable? How might you measure the reliability?
Ans. The sample with size n=1000 gives more reliable. Because it has a small standard deviation.
measure the reliability=0.09457919^2/(0.9428393^2+0.09457919^2)=0.00996248