In: Statistics and Probability
Write an R function that conducts a normality test as follows: it takes as input a data set, calculates a bootstrap confidence interval for the skewness, calculates a bootstrap confidence interval for the kurtosis, then sees if 0 is in the skewness interval and 3 is in the kurtosis interval. If so, your routine prints that the data is normally distributed, otherwise your routine should print that the data is not normally distributed. Test your routine on random data from normal (rnorm), uniform (runif), and exponential (rexp) , with sample sizes of n=10,30,70 and 100.
- The R package moments gives us two very useful functions; skewness and kurtosis. If data is truly normal, it should have a skewness value of 0 and a kurtosis value of 3.
R code with comments
-------------
#load the library moments
library(moments)
#function to test
testNormal<-function(data){
#set the number of bootstrap resamples
B<-10000
n<-length(data)
#initialize the variables to hold skewness,
kurtosis
xskew<-numeric(B)
xkurt<-numeric(B)
#start the bootstrap loop
for (i in 1:B) {
#resample with replacement
s<-sample(data,size=n,replace=TRUE)
#calculate the sample skewness and kurtosis
xskew[i]<-skewness(s)
xkurt[i]<-kurtosis(s)
}
#get the 95% percentile interval for
skewness,kurtosis
ciskew<-quantile(xskew,c(0.025,0.975))
cikurt<-quantile(xkurt,c(0.025,0.975))
#check if normal
if ((ciskew[1]<=0 & ciskew[2]>=0) &
(cikurt[1]<= 3 &cikurt[2]>=3)){
return('The data is normally distributed')
} else {
return('the data is not normally distributed')
}
}
#set the random seed
set.seed(123)
#try for different sample sizes
sprintf('--->Normal data')
for (n in c(10,30,70,100)){
#test data from normal
out<-testNormal(rnorm(n))
print(sprintf('For sample size n=%g: %s',n,out))
}
sprintf('--->Uniform data')
for (n in c(10,30,70,100)){
#test data from uniform
out<-testNormal(runif(n))
print(sprintf('For sample size n=%g: %s',n,out))
}
sprintf('--->Exponential data')
for (n in c(10,30,70,100)){
#test data from exponential
testNormal(rexp(n))
print(sprintf('For sample size n=%g: %s',n,out))
}
-------
#get this