Question

In: Statistics and Probability

Generate 2500 random numbers that are uniformly distributed between 90 and 160. Prove experimentally that STD...

Generate 2500 random numbers that are uniformly distributed between 90 and 160. Prove experimentally that STD of sample means = STD of Population/sqrt(sample size) for sample sizes of 10 and 100. How close is your calculation of STD of sample means to the theoretical approximation? Keep number of samples in each case equal to sample size. Repeat for normal and weibull (also between 90 and 160). What does it say about STD of sample means as you increase your sample size? Repeat the problem by generating 2500 random numbers with 2 peaks in its frequency distribution. Does the STD of sample means estimate formula depend on the frequency distribution of your population? For each population that is uniform, normal, weibull and 2-peaks plot the histogram to QC your population data.

Solutions

Expert Solution

Solution for the first part: (Generate 2500 random numbers that are uniformly distributed between 90 and 160. Prove experimentally that STD of sample means = STD of Population/sqrt(sample size) for sample sizes of 10)

x=runif(2500,90,160)
means = numeric(10)
for(i in 1:10)
{
vector = x[((i-1)*10+1):(i*10)]
means[i] = mean(vector)
}
sd(means)
sd(x)/sqrt(10)
sd(means) - sd(x)/sqrt(10)

After running this code we get:

STD of sample means = 6.236872

STD of Population/sqrt(sample size) = 6.422313

We see that these are almost equal and thus prove the statement.

The difference of the STD of sample means and the theoretical approximation = -0.1854414

The Solution for the second part:(Generate 2500 random numbers that are uniformly distributed between 90 and 160. Prove experimentally that STD of sample means = STD of Population/sqrt(sample size) for sample sizes of 100)

x=runif(2500,90,160)
index = sample(1:2500,10000,replace = TRUE)
x = x[index]

#This part of the code has been done to choose 100 random samples of size 100 from the 2500 randomly generated numbers.
means = numeric(100)
for(i in 1:100)
{
vector = x[((i-1)*100+1):(i*100)]
means[i] = mean(vector)
}
sd(means)
sd(x)/sqrt(100)
sd(means) - sd(x)/sqrt(100)

After running this code we get:

STD of sample means =  2.010739

STD of Population/sqrt(sample size) = 2.020649

We see that these are almost equal and thus prove the statement.

The difference of the STD of sample means and the theoretical approximation = -0.009909479

The Solution for the third part: (Repeat for normal)

x=rnorm(3000,(160+90)/2,(160+90)/6)
x = x[x>90&&x<160]
x = x[1:2500]

#This part of the code is to make sure that the 2500 generated from normal distribution lie between 90 and 160. It is known that 99% of normally distributed data lie in (-3 , 3) and thus the std dev of the normal distribution has been chosen that way.

For sample size 10:

means = numeric(10)
for(i in 1:10)
{
vector = x[((i-1)*10+1):(i*10)]
means[i] = mean(vector)
}
sd(means)
sd(x)/sqrt(10)
sd(means) - sd(x)/sqrt(10)

The STD of means = 12.40488

The STD(population)/sqrt(Sample size) = 13.17351

Their difference is  -0.7686233

For sample size 100:

The R code:

x=rnorm(3000,(160+90)/2,(160+90)/6)
x = x[x>90&x<160]
x = x[1:2500]
index = sample(1:2500,10000,replace = TRUE)
x = x[index]
means = numeric(100)
for(i in 1:100)
{
vector = x[((i-1)*100+1):(i*100)]
means[i] = mean(vector)
}
sd(means)
sd(x)/sqrt(100)
sd(means) - sd(x)/sqrt(100)

The STD of means = 3.729523

The STD(population)/sqrt(Sample size) = 4.05702

Their difference is  -0.3274966

The solution for the fourth part: (Repeat for Weibull)

The R code: for sample size 10

library(distr)
d<-Truncate(Weibull(shape=1,scale=(90+160)/2),lower=90,upper=160)
x = d@r(2500)

#This part was done to generate 2500 numbers from the weibull dist between 90 and 160. The package "distr" has been used to do so.

means = numeric(10)
for(i in 1:10)
{
vector = x[((i-1)*10+1):(i*10)]
means[i] = mean(vector)
}
sd(means)
sd(x)/sqrt(10)
sd(means) - sd(x)/sqrt(10)

The STD of means =7.067525

The STD(population)/sqrt(Sample size) = 6.22025

Their difference is  0.8472753

The R code for sample size 100:

library(distr)
d<-Truncate(Weibull(shape=1,scale=(90+160)/2),lower=90,upper=160)
x = d@r(2500)
index = sample(1:2500,10000,replace = TRUE)
x = x[index]
means = numeric(100)
for(i in 1:100)
{
vector = x[((i-1)*100+1):(i*100)]
means[i] = mean(vector)
}
sd(means)
sd(x)/sqrt(100)
sd(means) - sd(x)/sqrt(100)

The STD of means =2.286154

The STD(population)/sqrt(Sample size) = 2.022889

Their difference is  0.263265

Solution to the Fifth part: (What does it say about STD of sample means as you increase your sample size? )

As we can see from all the above examples the STD of sample means is closer to the theoretical approximation as we increase the sample size that means that the theoretical approximation becomes more accurate.

Solution to Sixth part (Repeat for 2-peaked distribution)

The R code for sample size 10

y0 <- rlnorm(2500,mean=log(1), sd = log(3))
y1 <- rlnorm(2500,mean=log(100), sd = log(3))

flag <- rbinom(2500,size=1,prob=0.4)
y <- y0*(1 - flag) + y1*flag
x= log(y)

# This part has been done to generate a two peaked distribution. from two log normal distributions.

means = numeric(10)
for(i in 1:10)
{
vector = x[((i-1)*10+1):(i*10)]
means[i] = mean(vector)
}
sd(means)
sd(x)/sqrt(10)
sd(means) - sd(x)/sqrt(10)

The STD of means = 0.6063173

The STD(population)/sqrt(Sample size) = 0.7917557

Their difference is  -0.1854384

The R code for sample size 100:

y0 <- rlnorm(2500,mean=log(1), sd = log(3))
y1 <- rlnorm(2500,mean=log(100), sd = log(3))

flag <- rbinom(2500,size=1,prob=0.4)
y <- y0*(1 - flag) + y1*flag
x= log(y)
index = sample(1:2500,10000,replace = TRUE)
x = x[index]
means = numeric(100)
for(i in 1:100)
{
vector = x[((i-1)*100+1):(i*100)]
means[i] = mean(vector)
}
sd(means)
sd(x)/sqrt(100)
sd(means) - sd(x)/sqrt(100)

The STD of means = 0.2649177

The STD(population)/sqrt(Sample size) = 0.253053

Their difference is   0.01186469

Solution for the seventh part:(Does the STD of sample means estimate formula depend on the frequency distribution of your population?)

Yes it does. We can see from the results above that the estimate is closer to the value in some cases than the thers depending upon the frequency distribution.

Solution for the eighth part:(For each population that is uniform, normal, weibull and 2-peaks plot the histogram to QC your population data.)


Note: The histogram of normal and weibull doesnt look as expected as they are truncated to lie in between 90 and 160


Related Solutions

Use a random number generator to produce 1000 uniformly distributed numbers with a mean of 10, a
Use a random number generator to produce 1000 uniformly distributed numbers with a mean of 10, a minimum of 2, and a maximum of 18. Obtain the mean and the histogram of these numbers, and discuss whether they appear uniformly distributed with the desired mean.
Appendix B.4 is a table of random numbers that are uniformly distributed. Hence, each digit from...
Appendix B.4 is a table of random numbers that are uniformly distributed. Hence, each digit from 0 through (including) 9 has the same likelihood of occurrence. (Round your answers to 2 decimal places.) a) Compute the population mean and standard deviation of the uniform distribution of random numbers. Population mean    Population Standard Deviation b) Assume that 10 random samples of five values are selected from a table of random numbers. The results follow. Each row represents a random sample....
The weekly output of a steel mill is a uniformly distributed random variable that lies between...
The weekly output of a steel mill is a uniformly distributed random variable that lies between 110 and 175 metirc tons. 1. Compute the probability that the steel mill will produce more than 150 metric tons next week. 2. Determine the probability that the steel mill will produce between 120 and 160 metric tons next week.
Suppose we have a random variable X that is uniformly distributed between a = 0 and...
Suppose we have a random variable X that is uniformly distributed between a = 0 and b = 100. What is σ X? a. 0.913 b. 0.833 c. 50 d. 7.071
A random variable x is uniformly distributed between 20 and 52 . What is the expected...
A random variable x is uniformly distributed between 20 and 52 . What is the expected value of x?
Write an Arduino code that does the following. Generate 50 random numbers between the numbers 100...
Write an Arduino code that does the following. Generate 50 random numbers between the numbers 100 and 300. Pick a number at random out of these 50 random variables. a. Determine the probability of the chosen number being greater than 200. This may be achieved by counting the numbers that are greater than 200 and dividing the count by 50. Make sure you, i.Formulate the appropriate if-conditions to check for a number being greater than 200 ii. Use a for-loop...
To generate 100 random numbers between 1-100 in a randomData.txt file To read the 100 random...
To generate 100 random numbers between 1-100 in a randomData.txt file To read the 100 random numbers from randomData.txt and store them in an array Print the data in the array Find the smallest and the largest of the random numbers and their array position Insert an element of value100 in the 51th position of the array Delete all the elements of the array having values between 50-80 and print the residual array Sort the data in the final array(residual)...
Use R to generate n = 400 samples (idependent identically distributed random numbers) of X ∼...
Use R to generate n = 400 samples (idependent identically distributed random numbers) of X ∼ N(0, 4). For each Xi , simulate Yi according to Yi = 3 + 2.5Xi + εi , where εi ∼ N(0, 16), i = 1, ..., n. Use R to solve the following questions. (a) Compute the least square estimators of βˆ 0 and βˆ 1. (b) Draw a regression line according to the numbers computed in (a). Plot Y and X with...
Generate 1000 random numbers from ??3? starting with standard normal random numbers in R.
Generate 1000 random numbers from ??3? starting with standard normal random numbers in R.
Let Y and Z be independent continuous random variables, both uniformly distributed between 0 and 1....
Let Y and Z be independent continuous random variables, both uniformly distributed between 0 and 1. 1. Find the CDF of |Y − Z|. 2. Find the PDF of |Y − Z|.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT