Question

In: Statistics and Probability

The `R` package \texttt{moments} gives us two very useful functions; \texttt{skewness} and \texttt{kurtosis}. If data is...

The `R` package \texttt{moments} gives us two very useful functions; \texttt{skewness} and \texttt{kurtosis}. If data is truly normal, it should have a skewness value of 0 and a kurtosis value of 3. Write an R function that conducts a normality test as follows: it takes as input a data set, calculates a bootstrap confidence interval for the skewness, calculates a bootstrap confidence interval for the kurtosis, then sees if 0 is in the skewness interval and 3 is in the kurtosis interval. If so, your routine prints that the data is normally distributed, otherwise your routine should print that the data is not normally distributed. Test your routine on random data from normal (rnorm), uniform (runif), and exponential (rexp) , with sample sizes of n=10,30,70 and 100.

An example code fragment is below:

```{r}
library(moments)
mynormtest = function(x){
#find bootstrap CI for skewness of x
# find bootstrap CI for kurtosis of x
cat("Some otuput goes here\n")
}

Solutions

Expert Solution

library(moments)


mynormtest = function(data_vector){
columns = 300 # no. of bootstrap samples
rows = length(data_vector) # no. of observation in our data
  
boot_samples_matrix = matrix(nrow = rows)
  
for (i in 1:columns){
boot_samples_matrix = cbind(boot_samples_matrix, sample(data_vector, size = rows, replace = TRUE))
}
  
boot_samples_matrix = boot_samples_matrix[,2:101]
  
skew_list = apply(boot_samples_matrix,2,skewness)
mean_skew = mean(skew_list)
sd_skew = sd(skew_list)
n = length(skew_list)

# applying 95% confidence interval
skew_ci_95 = c(mean_skew-qnorm(0.975)*sd_skew/sqrt(n), mean_skew+qnorm(0.975)*sd_skew/sqrt(n))
  
kurtosis_list = apply(boot_samples_matrix,2,kurtosis)
mean_kurtosis = mean(kurtosis_list)
sd_kurtosis = sd(kurtosis_list)
n = length(kurtosis_list)
kurtosis_ci_95 = c(mean_kurtosis-qnorm(0.975)*sd_kurtosis/sqrt(n), mean_kurtosis+qnorm(0.975)*sd_kurtosis/sqrt(n))
  
bool_skew = skew_ci_95[1] <= 0 & 0 <= skew_ci_95[2]
bool_kurtosis = kurtosis_ci_95[1] <= 3 & 3 <= kurtosis_ci_95[2]
  
if(bool_skew & bool_kurtosis){
print("The data is Normally distributed")
}else{
print("The data is NOT Normally distributed")
}
}

#......... checking on random data

#1) Random Normal data

x1 = rnorm(10)
x2 = rnorm(30)
x3 = rnorm(70)
x4 = rnorm(100)

mynormtest(x1) # calling our function
mynormtest(x2)
mynormtest(x3)
mynormtest(x4)


#2) Random Uniform data
y1 = runif(10)
y2 = runif(30)
y3 = runif(70)
y4 = runif(100)

mynormtest(y1)
mynormtest(y2)
mynormtest(y3)
mynormtest(y4)

#3) Random exponential

z1 = rexp(10)
z2 = rexp(30)
z3 = rexp(70)
z4 = rexp(100)

mynormtest(z1)
mynormtest(z2)
mynormtest(z3)
mynormtest(z4)


mynormtest(y1)
mynormtest(y2)
mynormtest(y3)
mynormtest(y4)


Related Solutions

Using the data, find the sample skewness and excess kurtosis of Michelson data on the speed...
Using the data, find the sample skewness and excess kurtosis of Michelson data on the speed of light (light.txt). Is this data set nrmally distribuited, based on this normal probability plot? 850 740 900 1070 930 850 950 980 980 880 1000 980 930 650 760 810 1000 1000 960 960 960 940 960 940 880 800 850 880 900 840 830 790 810 88 880 830 800 790 760 800 880 880 880 860 720 720 620 800 970...
5. What is the skewness and kurtosis of each data set? 6. Generate a histogram plot...
5. What is the skewness and kurtosis of each data set? 6. Generate a histogram plot of each of the data sets. 7. Based on the variability of the data, what do you think the next step would be to analyze the data? Age Income 29 9315 25 6590 28 9668 27 8412 25 1654 24 2431 25 6977 19 8966 27 9327 18 3871 25 9934 19 2236 19 3035 29 2518 19 3616 19 9219 28 1090 18...
"Accident" in R package vcdExtra gives a 4-way table of frequencies of traffic accident victims in...
"Accident" in R package vcdExtra gives a 4-way table of frequencies of traffic accident victims in France in 1958. See: help(Accident, package="vcdExtra") I want to create a binomial distribution with the response variable "result" (died or injured). In particular, I want to create a one-way frequency table with: -age and frequency of injured -mode and frequency of injured -age and frequency of died -mode and frequency of died What is the code for making a one-way frequency table?
warpbreaks is a built-in R dataset which gives This data set gives the number of warp...
warpbreaks is a built-in R dataset which gives This data set gives the number of warp breaks per loom, where a loom corresponds to a fixed length of yarn. We are interested in some descriptive statistics related to the warpbreaks dataset. We can access this data directly and convert the time series into a vector by using the assignment x <- warpbreaks$breaks. (In R, use ? warpbreaks for info on this dataset.) The values of x if assigned as above...
Built in Data In R: This Question uses "cystfibr" data found in "ISwR" package. to access...
Built in Data In R: This Question uses "cystfibr" data found in "ISwR" package. to access this data you need to first install "ISwR" package. then load the library. Type data() to check which built in data are in the package "ISwR". This should show all the available built in data as: We use nickel data for this part. Type >cystfibr to see the data, and then answer the following questions using the data: (a) type ?cystfibr this will open...
Using the package “wooldridge’, and the data ‘hprice1’ (in R-Software) to estimate the model price =...
Using the package “wooldridge’, and the data ‘hprice1’ (in R-Software) to estimate the model price = β0 + β1sqrft + β2bdrms + u , where is the house price measured in thousands of dollars. 1. Write out the results in equation form. 2.  What is the estimated increase in price for a house with one more bedroom, holding square footage constant? 3. What is the estimated increase in price for a house with an additional bedroom that is 140 square feet...
The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In...
The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In the study, three treatments (Treat) were applied to groups of young female anorexia patients, and their weights before (Prewt) and after (Postwt) treatment were recorded. The three treatments adminstered were no treatment (Cont), Cognitive Behavioural treatment (CBT), and family treatment (FT). Determine at the 5% significance level if there is a difference in mean weight gain between those receiving no treatment and those receiving...
The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In...
The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In the study, three treat- ments (Treat) were applied to groups of young female anorexia patients, and their weights before (Prewt) and after (Postwt) treatment were recorded. The three treatments adminstered were no treatment (Cont), Cognitive Behavioural treatment (CBT), and family treatment (FT). Determine at the 5% significance level if there is a difference in mean weight gain between those receiving no treatment and those...
In sociology we say that culture is very important for human societies since culture gives us...
In sociology we say that culture is very important for human societies since culture gives us a framework to make sense of our lives and social interactions in society. How culture has developed since prehistorical times, how do you think it has contributed to human social development and human evolution? Today human cultures are diverse and different across the planet. Why in sociology we encourage cultural relativism and we challenge ethnocentrism? Can you please give me 300 words for these...
Using the R package to answer the following two questions. You MUST submit your R code...
Using the R package to answer the following two questions. You MUST submit your R code for analysis. 2. Below are heights for a simple random sample of n = 15 young trees (in cm). (50 pts) 27, 33, 33, 34, 36, 37, 39, 40, 40, 41, 41, 42, 44, 46, 47. (a) Test the hypothesis that the mean tree height is equal to 38 cm. (b) Calculate the 95% confidence interval for the population mean of young trees. (c)...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT