Question

In: Statistics and Probability

Using R studio: We will test whether the Shapiro-Wilk test is resistant to outliers. Run the...

Using R studio:

  1. We will test whether the Shapiro-Wilk test is resistant to outliers. Run the code below and decide whether the presence of a single outlier [the 5] changes the ability of the test to determine normality. What conclusions can you make from this experiment?

shapiro.test(c(rnorm(100), 6))

shapiro.test(c(rnorm(1000), 6))

shapiro.test(c(rnorm(4000), 6))

Solutions

Expert Solution

> shapiro.test(c(rnorm(100), 6))

Shapiro-Wilk normality test

data: c(rnorm(100), 6)
W = 0.90901, p-value = 3.489e-06

> shapiro.test(c(rnorm(1000), 6))

Shapiro-Wilk normality test

data: c(rnorm(1000), 6)
W = 0.9936, p-value = 0.0002736

> shapiro.test(c(rnorm(4000), 6))

Shapiro-Wilk normality test

data: c(rnorm(4000), 6)
W = 0.99848, p-value = 0.0007784

If outlier is far away from original data then presence of single outlier can change the result of Shapiro wilk test.

But if outlier is not far away from original data then Shapiro -Wilk test give correct result even outlier is present in the data.

Consider following result:

> shapiro.test(c(rnorm(100), 4))

Shapiro-Wilk normality test

data: c(rnorm(100), 4)
W = 0.9749, p-value = 0.05094

> shapiro.test(c(rnorm(1000), 4))

Shapiro-Wilk normality test

data: c(rnorm(1000), 4)
W = 0.99789, p-value = 0.2399

> shapiro.test(c(rnorm(4000), 4))

Shapiro-Wilk normality test

data: c(rnorm(4000), 4)
W = 0.99965, p-value = 0.7486

Here 4 is outlier. Then also Shapiro Wilk test gives correct result.


Related Solutions

Using R Studio/R programming... Usually, we will use a random sample to estimate the statistics of...
Using R Studio/R programming... Usually, we will use a random sample to estimate the statistics of the underlying population. If we assume a given population is a standard normal distribution and we want to estimate its mean, which is the better technique to estimate that mean from a sample: Use the mean of one random sample of size 500 Use the mean of 300 random samples of size 10 Run your own experiment and use your results as a supporting...
Using R Studio/R programming... A consumer-reports group is testing whether a gasoline additive changes a car's...
Using R Studio/R programming... A consumer-reports group is testing whether a gasoline additive changes a car's gas mileage. A test of seven cars finds an average improvement of 0.4 miles per gallon with a standard deviation of 3.57. Is the difference significantly greater than 0? Assume that the values are normally distributed. What would the code be?
Using R Studio: A College Algebra course requires students to take an assessment test at the...
Using R Studio: A College Algebra course requires students to take an assessment test at the start of the course and again at the end of the course. The pre and post test scores for ten students are: Student 1 2 3 4 5 6 7 8 9 10 Pre-test score 70 62 63 61 56 52 71 63 64 67 Post-test score 87 71 82 78 57 50 72 65 78 65 Do the assessment test results support the...
(a) Using R and R Commander, run the Correlation test between the variables attpublicschools and attteachingprof....
(a) Using R and R Commander, run the Correlation test between the variables attpublicschools and attteachingprof. Copy appropriate output and paste it below this question. Is the correlation significant at 5% alpha level?                                                                                   [6 Points] I have done the correlation test. (see below for data) How do I know if the correlation significant at 5% alpha level? Is 5% alpha the same as 95% confidence interval? Pearson's product-moment correlation data: attpublicschools and attteachingprof t = 3.724, df = 22,...
Movies 6b and 6c Part A - Using either R studio and/or SPSS, calculate a t-test...
Movies 6b and 6c Part A - Using either R studio and/or SPSS, calculate a t-test for independent means for the question 10 data in your dataset. Be sure to cut and paste your output into this document. When using R this should include your code. Be sure to report the t-test in the correct format AND a complete write up in a paragraph form following the example in the movie. The research hypothesis asked if there are differences between...
How would I go about this question using R studio? In a multiple regression, investigate whether...
How would I go about this question using R studio? In a multiple regression, investigate whether the categorical variable "Type" has a statistically significant interaction effect with any of the other covariates, A, B, C and D. Of those interactions that are statistically significant (if any), determine which one has the most impact on the model and add it to your model. Please include general 'formulas' for the commands needed!
Answer using R Studio Here we consider the amount of data needed to perform hypothesis testing....
Answer using R Studio Here we consider the amount of data needed to perform hypothesis testing. Suppose we are testing a coin using observations of tosses. We wish to test H0: p = 0.5 against an alternative of HA : p = 0.6 (in this question use one-sided tests only). How many tosses are needed to guarantee a size Æ∑ 0.05 and Ø∑ 0.2? Now generalize to consider HA : p = 0.5+delta. Choose sensible values for delta and quantify...
Using R-studio 2. Consider an experiment where we flip a fair coin six times in a...
Using R-studio 2. Consider an experiment where we flip a fair coin six times in a row, and i is the number of heads tossed:             a.         Calculate the probability mass function for i = 0. . . 6 using the equation from Ross section 2.8 for Binomial Random Variables             b.         Conduct a simulation of this experiment in R, with T trials of the experiment – pick several values of T from 10 to 10,000.             c.         Create a...
I want this to be solved using R studio or R software, please. Here is the...
I want this to be solved using R studio or R software, please. Here is the example: The data in stat4_prob5 present the performance of a chemical process as a function of sever controllable process variables. (a) Fit a multiple regression modelrelating CO2product (y) to total solvent (x1) and hydrogen consumption (x2) and report the fitted regression line. (b) Find a point estimatefor the variance term σ2. (c) Construct the ANOVA tableand test for the significance of the regression using...
how do you test for regression in R studio with variables x and y
how do you test for regression in R studio with variables x and y
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT