In: Statistics and Probability
gather data on anything you want. you must have at least 50 samples to draw from. determine if the data is normally distributed or not and why. you must show your data as well as a paragraph to explain.
I had done a study of the length of some snakes observed during rainy season in a locality. The data is presented below:
5.42 5.00 7.79 8.77 7.38 3.54 7.88 2.97 5.77 8.06 7.48 8.90 5.93
2.57 9.71
8.91 6.54 3.01 8.33 4.43 5.09 7.94 0.71 8.74 2.24 6.10 0.95 4.87
6.47 1.15
7.80 6.55 5.55 2.40 9.24 2.90 8.94 9.76 6.80 1.50 7.05 4.47 4.13
4.82 5.16
0.79 9.54 5.65 5.89 6.29.
I first need to test whether the data can be assumed to be normal. For this test, I shall be using a Shapiro-Wilk test.
Null Hypothesis: The given set of data follows a normal distribution
Alt. Hypothesis: The underlying distribution is not normal.
The data is first ordered from lowest to the highest. After rearranging, the data point 1 will contain the lowest value denoted by . Hence will contain teh lowest value and will have the highest value.
The test-statistic is
where
The coefficients are given by:
where C is a vector norm:
and the vector m,
, T means the transpose vector.
is made of the expected values of the order statistics of independent and identically distributed random variables sampled from the standard normal distribution; finally, is the covariance matrix of those normal order statistics.
For this particular data we have considered, the results after
running a Shapiro-Wilk test in R is
W = 0.95589, p-value = 0.05983
In this case, the p-value is >0.05 which means that the result is not statistically significant and hence the null hypothesis is not rejected. Otherwise, it implies that the data can be regarded as a sample from a normal population.