In: Statistics and Probability
Many statistical procedures require that we draw a sample from a population whose distribution is approximately normal. Often we don’t know whether the population is approximately normal when we draw the sample. So the only way we assess whether the population is approximately normal is to examine its sample. Assessing normality is more important for small samples. Below, you’ll see some small samples and you’ll be asked to assess whether the populations they are drawn from can be treated as approximately normal.
DATA-
2.6 |
4.2 |
1.5 |
2.0 |
0.6 |
0.7 |
6.6 |
2.2 |
9.7 |
1.8 |
4.2 |
4.4 |
0.6 |
0.2 |
The following data set is given. Determine whether it is reasonable to treat the following sample as though it comes from an approximately normal population. Include any charts or graphs you make in Excel here and justify your answer.
The following normal quantile plot illustrates a sample. Determine whether it is reasonable to treat this sample as though it comes from an approximately normal population. Explain your answer.
The following histogram illustrates a sample. Determine whether it is reasonable to treat this sample as though it comes from an approximately normal population. Explain your answer.
The following data set is given. Determine whether it is reasonable to treat the following sample as though it comes from an approximately normal population. Include any charts or graphs you make in Excel here and justify your answer.
8.8 |
11.2 |
11.6 |
6.3 |
9.3 |
10.5 |
14.6 |
8.5 |
7.3 |
7.5 |
5.2 |
9.0 |
4.3 |
9.9 |
7.8 |
13.1 |
12.3 |
10.1 |
Q-Q Plot is the best way to find if the data is normally distributed or not, using EXCEL
Calculate the following 3 values using the below formulae:
a) Probability of i = (i -0.5)/total elements
b) Z-Score of i = NORMSINV(i)
c) Standardized value of i = (i - average of all elements)/sample std deviation of all elements
Now we find whether the data is normally distributed or not using following method:
1) For the first dataset containing 14 data entries, we first arrange them in ascending order and calculate probability, Z-Score and Standardized values in Excel and tabulate these values as shown below:
S.No |
Data |
Probability = (i-0.5)/n |
Z-Score |
Standardised Data |
1 |
0.2 |
0.035714 |
-1.80274 |
-1.0293 |
2 |
0.6 |
0.107143 |
-1.24187 |
-0.87958 |
3 |
0.6 |
0.178571 |
-0.92082 |
-0.87958 |
4 |
0.7 |
0.25 |
-0.67449 |
-0.84215 |
5 |
1.5 |
0.321429 |
-0.46371 |
-0.54272 |
6 |
1.8 |
0.392857 |
-0.27188 |
-0.43043 |
7 |
2 |
0.464286 |
-0.08964 |
-0.35558 |
8 |
2.2 |
0.535714 |
0.089642 |
-0.28072 |
9 |
2.6 |
0.607143 |
0.27188 |
-0.131 |
10 |
4.2 |
0.678571 |
0.463708 |
0.467864 |
11 |
4.2 |
0.75 |
0.67449 |
0.467864 |
12 |
4.4 |
0.821429 |
0.920823 |
0.542722 |
13 |
6.6 |
0.892857 |
1.241867 |
1.366162 |
14 |
9.7 |
0.964286 |
1.802743 |
2.526464 |
Then using the last 2 columns namely Z-Score and Standardized data, we plot a Q-Q plot i.e a scatter plot with a trend line showing how much is the standardized curve deviating from normal curve:
The Q-Q Plot above has the blue curve i.e the quantile plot deviating a lot from the Linear trend line (black line). Hence this data cannot be considered to be following standard distribution.
2) Consider the second set of data containing 18 entries and arrange them in ascending order and calculate probability, Z-Score and Standardized values in Excel and tabulate these values as shown below:
S.No |
Data |
Probability = (i-0.5)/n |
Z-Score |
Standardised Data |
1 |
4.3 |
0.027778 |
-1.91451 |
-1.84816 |
2 |
5.2 |
0.083333 |
-1.38299 |
-1.51512 |
3 |
6.3 |
0.138889 |
-1.08532 |
-1.10807 |
4 |
7.3 |
0.194444 |
-0.86163 |
-0.73803 |
5 |
7.5 |
0.25 |
-0.67449 |
-0.66402 |
6 |
7.8 |
0.305556 |
-0.50849 |
-0.55301 |
7 |
8.5 |
0.361111 |
-0.35549 |
-0.29398 |
8 |
8.8 |
0.416667 |
-0.21043 |
-0.18297 |
9 |
9 |
0.472222 |
-0.06968 |
-0.10896 |
10 |
9.3 |
0.527778 |
0.069685 |
0.002056 |
11 |
9.9 |
0.583333 |
0.210428 |
0.224082 |
12 |
10.1 |
0.638889 |
0.35549 |
0.29809 |
13 |
10.5 |
0.694444 |
0.508488 |
0.446107 |
14 |
11.2 |
0.75 |
0.67449 |
0.705138 |
15 |
11.6 |
0.805556 |
0.861634 |
0.853155 |
16 |
12.3 |
0.861111 |
1.085325 |
1.112185 |
17 |
13.1 |
0.916667 |
1.382994 |
1.408219 |
18 |
14.6 |
0.972222 |
1.914506 |
1.963284 |
Using Excel's scatterplot, we again plot the Q-Q Plot as shown below:
Here in the Q-Q plot, we see that most of the points expect 2 points, lie closer to the linear black trend line. Hence this dataset follows normal distribution.