In: Statistics and Probability
Consider the following sets of data. Explain (without carrying out a formal Goodness-of-fit test) whether you think the data follows the distribution it is expected to follow. You might choose to find the expected frequencies and compare them with the observed frequencies to make a conclusion. (3 points)
Data Set I: Grades Distribution
Grades |
Claimed Distribution |
Survey Results |
A |
10% |
20 |
B |
30% |
65 |
C |
35% |
75 |
D |
10% |
17 |
F |
5% |
8 |
Data Set II: Ice-Cream Flavors Preferred
Flavors |
Claimed Distribution |
Survey Results |
Vanilla |
10% |
80 |
Chocolate |
30% |
50 |
Strawberry |
35% |
60 |
Coffee |
10% |
25 |
Rainbow |
5% |
1 |
Data Set III: Type of Credit Card
Credit Card |
Claimed Distribution |
Survey Results |
Visa |
40% |
120 |
Discover |
20% |
65 |
MasterCard |
25% |
75 |
American Express |
10% |
27 |
Other |
5% |
10 |
Data Set IV: Type of Sport watched
Sports Watched |
Claimed Distribution |
Survey Results |
Baseball |
20% |
120 |
Hockey |
5% |
45 |
Soccer |
25% |
75 |
Football |
20% |
47 |
Basketball |
30% |
100 |
The expected frequencies can be found by multiply the particular distribution % * total .
Eg. Set 1
A : = 10% * 185 = 18.5
Grades | Claimed | Survey | Expected | |
Distribution | Results | |||
A | 10% | 20 | 18.5 | |
B | 30% | 65 | 55.5 | |
C | 35% | 75 | 64.75 | |
D | 10% | 17 | 18.5 | |
F | 5% | 8 | 9.25 | |
Total | 185 | |||
Here if we see the expected and observed are close to each for 3 of the 5. The remaining have a difference of 10 each. So we can say that the the survey matches with the distribution on some level. | ||||
Data Set II: Ice-Cream Flavors Preferred | ||||
Flavors | Claimed | Survey | Expected | |
Distribution | Results | |||
Vanilla | 10% | 80 | 21.6 | |
Chocolate | 30% | 50 | 64.8 | |
Strawberry | 35% | 60 | 75.6 | |
Coffee | 10% | 25 | 21.6 | |
Rainbow | 5% | 1 | 10.8 | |
Here if we see the expected and observed are very far from each other for all the results. So we can say that the the survey does not match with the distribution on a very high level. | ||||
Data Set III: Type of Credit Card | ||||
Credit | Claimed | Survey | Expected | |
Card | Distribution | Results | ||
Visa | 40% | 120 | 118.80 | |
Discover | 20% | 65 | 59.40 | |
MasterCard | 25% | 75 | 74.25 | |
American Express | 10% | 27 | 29.70 | |
Other | 5% | 10 | 14.85 | |
Here if we see the expected and observed are close to each for almost all the values. The difference is not much. So we can say that the the survey matches with the distribution on high level. | ||||
Data Set IV: Type of Sport watched | ||||
Sports Watched | Claimed | Survey | Expected | |
Distribution | Results | |||
Baseball | 20% | 120 | 77.4 | |
Hockey | 5% | 45 | 19.35 | |
Soccer | 25% | 75 | 96.75 | |
Football | 20% | 47 | 77.4 | |
Basketball | 30% | 100 | 116.1 | |
Here if we see the expected and observed are very far from each other for all the results. So we can say that the the survey does not match with the distribution on a very high level. |