Question

In: Statistics and Probability

RStudio R-Programming Statistics Problem - power.t.test and p-values *I believe we're expected to write code using...

RStudio R-Programming Statistics Problem - power.t.test and p-values

*I believe we're expected to write code using for loops? Any guidance will help!

4. For this problem, you will run a simulation to investigate how violating the assumption of normally distributed data can affect the properties of a t-test.

a. The gamma distribution is skewed to the right. It contains a parameter called “shape”. The R function for generating data from a gamma distribution is rgamma – you can read the details in R help.   Make three histograms, each of a sample of size n = 10,000 drawn from a gamma distribution, with shape = 1, shape = 0.5, and shape = 0.1. Use “breaks = 100” to force each histogram to have lots of bars. Describe what you see happening as the shape parameter gets smaller.

b. Write a simulation that repeatedly draws two samples from a gamma distribution with shape = 1, then compares their means using a t-test. For this simulation, use n = 30 for the size of each sample. Write code that will save both the t-test statistic and p-value each time. Then make a histogram of the test statistics, and report the proportion of p-values less than 0.05. Note that, if the assumptions of the t-test are not violated, the p-value should be less than 0.05 5% of the time.

c. Do the same thing in part b. two more times, using shape = 0.5 and shape = 0.1. Does this seem to have any effect on the distribution of the test statistics, or the proportion of pvalues less than 0.05?   d. Run the simulation three more times (once for each value of shape), using samples of size n = 10 rather than n = 30. Show the three histograms and three proportions of p-values less than 0.05. Did this have any noticeable effect on the results?

Solutions

Expert Solution

a.

R CODE:

d1<- rgamma(n=10000,shape=1)
d2<- rgamma(n=10000,shape=0.5)
d3<- rgamma(n=10000,shape=0.1)
hist(d1,breaks=100)
windows()
hist(d2,breaks=100)
windows()
hist(d3,breaks=100)
windows()
R OUTPUT:

As the shape -parameter decreases, the variability in the data and the skewness increases.

b.

R CODE:
count=0
for(i in 1:10000){
t=t.test(rgamma(n=30,shape=1),rgamma(n=30,shape=1),alternative="two.sided")
p[i]=t$p.value
s[i]=t$statistic
if(p[i]<0.05){
count=count+1
}
}
count/10000
hist(s,breaks=100)
windows()

R OUTPUT:

[1]  0.0446

The proportion of p-values less than 0.05 is obtained as 0.0446 which is less than 5%. Hence it confirms the conditions.

The histograph appears to be more or less mesokurtic and symmetric, following Central Limit Theorem.

c.

R CODE:

count=0
for(i in 1:10000){
t=t.test(rgamma(n=30,shape=1),rgamma(n=30,shape=1),alternative="two.sided")
p[i]=t$p.value
s[i]=t$statistic
}
test1=t.test(rgamma(n=30,shape=0.5),rgamma(n=30,shape=0.5),alternative="two.sided")
test2=t.test(rgamma(n=30,shape=0.1),rgamma(n=30,shape=0.1),alternative="two.sided")
p[10001]=test1$p.value
p[10002]=test2$p.value
s[10001]=test1$statistic
s[10002]=test2$statistic
for(i in 1:10002){
if(p[i]<0.05){
count=count+1
}
}
count/10002
hist(s,breaks=100)

R OUTPUT:

[1] 0.04509098

This does not affect much the distribution of the test statistic as it is evident from the histogram being still mesokurtic and symmetric.

The proportion of p-values less than 0.05 is obtained to be less than 5%. Hence it does not affect the proportion of p-values less than 0.05.

d.

R CODE:

count=0
for(i in 1:10000){
t=t.test(rgamma(n=30,shape=1),rgamma(n=30,shape=1),alternative="two.sided")
p[i]=t$p.value
s[i]=t$statistic
}
test1=t.test(rgamma(n=30,shape=0.5),rgamma(n=30,shape=0.5),alternative="two.sided")
test2=t.test(rgamma(n=30,shape=0.1),rgamma(n=30,shape=0.1),alternative="two.sided")
test3=t.test(rgamma(n=10,shape=1),rgamma(n=10,shape=1),alternative="two.sided")
test4=t.test(rgamma(n=10,shape=0.5),rgamma(n=10,shape=0.5),alternative="two.sided")
test5=t.test(rgamma(n=10,shape=0.1),rgamma(n=10,shape=0.1),alternative="two.sided")
p[10001]=test1$p.value
p[10002]=test2$p.value
p[10003]=test3$p.value
p[10004]=test4$p.value
p[10005]=test5$p.value
s[10001]=test1$statistic
s[10002]=test2$statistic
s[10003]=test3$statistic
s[10004]=test4$statistic
s[10005]=test5$statistic
for(i in 1:10005){
if(p[i]<0.05){
count=count+1
}
}
count/10005
hist(s,breaks=100)

R OUTPUT:

[1] 0.04757621

This again does not affect much the distribution of the test statistic as it is evident from the histogram being still mesokurtic and symmetric.

The proportion of p-values less than 0.05 is obtained to be less than 5%. Hence it does not affect the proportion of p-values less than 0.05.

Hopefully this will help you. In case of any query, do comment. If you are satisfied with the answer, give it a like.Thanks.


Related Solutions

Problem 3 Write code in R or Rstudio (Programming) A prime number is an integer greater...
Problem 3 Write code in R or Rstudio (Programming) A prime number is an integer greater than one whose only factors are one and itself. For example, the first ten prime numbers are 2, 3, 5, 7, 11, 13, 17, 19, 23 and 29. A twin prime is a prime that has a prime gap of two. Sometimes the term twin prime is used for a pair of twin primes. For example, the five twin prime pairs are (3, 5),...
Complete the R code using Rstudio so that it calculates and returns the estimates of β,...
Complete the R code using Rstudio so that it calculates and returns the estimates of β, the intercept and regression weight of the logistic regression of approximate GPA on Rouder-Srinivasan preference. ## Data Preference <- c( 0, 0, 0, 0, 0, 1, 1, 1, 1) # 0: Rouder; 1: Srinivasan GPA <- c(2.0, 2.5, 3.0, 3.5, 4.0, 2.5, 3.0, 3.5, 4.0) Count <- c( 4, 5, 21, 22, 8, 2, 1, 4, 7) # Define the deviance function deviance <-...
Using R Studio/R programming... Usually, we will use a random sample to estimate the statistics of...
Using R Studio/R programming... Usually, we will use a random sample to estimate the statistics of the underlying population. If we assume a given population is a standard normal distribution and we want to estimate its mean, which is the better technique to estimate that mean from a sample: Use the mean of one random sample of size 500 Use the mean of 300 random samples of size 10 Run your own experiment and use your results as a supporting...
in R To explore the characteristics of a Type I error rate, write the R code...
in R To explore the characteristics of a Type I error rate, write the R code to do the following: (a) Generate 30 values from X~N(μX =10,σX=4) and 30 values from Y~N(μY =10,σY=4). Do not print any of these values. Use a t-test to test the hypotheses given above. (You are allowed to use the built-in R function to perform the t-test.) (b) Include a comment in your code that identifies the p-value and clearly state the conclusion of the...
***This problem must be done using R so please provide the R code used to find...
***This problem must be done using R so please provide the R code used to find the solution. I have provided the data in data-wtLoss.txt below the question. I will also give "thumbs-up for correct R code" Thanks in advance.*** The file “data-wtLoss.txt” contains data on weight loss and self esteem evaluation at three time points over a period of three months for 34 individuals who are randomly selected from a residential area. These individuals are randomly assigned to one...
***This problem must be done using R so please provide the R code used to find...
***This problem must be done using R so please provide the R code used to find the solution. I have provided the data in data-wtLoss.txt below the question. I will also give "thumbs-up for correct R code" Thanks in advance.*** The file “data-wtLoss.txt” contains data on weight loss and self esteem evaluation at three time points over a period of three months for 34 individuals who are randomly selected from a residential area. These individuals are randomly assigned to one...
Using python as the coding language please write the code for the following problem. Write a...
Using python as the coding language please write the code for the following problem. Write a function called provenance that takes two string arguments and returns another string depending on the values of the arguments according to the table below. This function is based on the geologic practice of determining the distance of a sedimentary rock from the source of its component grains by grain size and smoothness. First Argument Value Second Argument Value Return Value "coarse" "rounded" "intermediate" "coarse"...
Program: Java Write a Java program using good programming principles that will aggregate the values from...
Program: Java Write a Java program using good programming principles that will aggregate the values from several input files to calculate relevant percentages and write the values to an output file. You have been tasked with reading in values from multiple files that contains different pieces of information by semester. The Department of Education (DOE) would like the aggregate values of performance and demographic information by academic year. A school year begins at the fall semester and concludes at the...
Program: Java Write a Java program using good programming principles that will aggregate the values from...
Program: Java Write a Java program using good programming principles that will aggregate the values from several input files to calculate relevant percentages and write the values to an output file. You have been tasked with reading in values from multiple files that contains different pieces of information by semester.    The Department of Education (DOE) would like the aggregate values of performance and demographic information by academic year. A school year begins at the fall semester and concludes at the...
Just part d. Using R programming, struggling with the code. Let Y1,…,Yn be independent Poisson random...
Just part d. Using R programming, struggling with the code. Let Y1,…,Yn be independent Poisson random variables with mean θ. a) Derive the method of moments estimator for θ. b) Derive the maximum likelihood estimator θ̂n for θ. How does this compare to what you found in part a. c) Provide the asymptotic sampling distribution for θ̂n. d) With the following data values for y given in R below, what are the chances that the data could have been generated...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT