In: Statistics and Probability
According to an English professor, the following information represents the percent of times that each letter is used in the English language:
E = 11.16%, A = 8.5%, I = 7.54%, O = 7.16%, U = 3.63% (E.g. 11.16% of letters found in any writing sample will be letter E's)
Choose a paragraph of at least 30 words from any written source and count the number of times each vowel shows up in your paragraph. Then, perform a goodness-of-fit test to see if the above percents are a good fit for the population based on your sample paragraph.
Please make sure you include your paragraph when you submit your assignment. Note spaces, apostrophe's, etc. do not count towards the total number of letters in your paragraph.
Paragraph:
The Centers for Disease Control and Prevention have advised people to “put distance” between themselves and others if the virus is spreading in their community. Thousands of Americans are following this directive, and numerous workplaces have directed their employees to work from home.
From the given paragraph we calculate appearance of each of the vowels as well as total number of letters.
Total number of letters = 239
So, expected and observed frequencies are as follows.
Vowel | Observed frequency (Oi) | Expected frequency (Ei) | (Oi-Ei)2/Ei |
A | 14 | 239*8.5/100=20.3150 | 1.963043 |
E | 36 | 239*11.16/100=26.6724 | 3.261953 |
I | 18 | 239*7.54/100=18.0206 | 0.000024 |
O | 19 | 239*7.16/100=17.1124 | 0.208214 |
U | 6 | 239*3.63/100= 8.6757 | 0.825221 |
Total | 93 | 90.7961 | 6.258455 |
We have to perform chi square test for goodness of fit.
We have to test for null hypothesis
against the alternative hypothesis
Our test statistics is given by
Here,
Number of observations
Degrees of freedom
Corresponding
We found p-value significantly higher. So we can take our null hypothesis.
Hence based on our sample paragraph, at 82% (and more) confidence interval we can conclude that the given percents are a good fit for the population.