In: Statistics and Probability
A fisherman is interested in whether the distribution of fish caught in Green Valley Lake is the same as the distribution of fish caught in Echo Lake. Of the 385 randomly selected fish caught in Green Valley Lake, 206 were rainbow trout, 67 were other trout, 45 were bass, and 67 were catfish. Of the 519 randomly selected fish caught in Echo Lake, 248 were rainbow trout, 108 were other trout, 81 were bass, and 82 were catfish. Conduct the appropriate hypothesis test using an α = 0.10 level of significance. What is the correct statistical test to use? Paired t-test Homogeneity Independence Goodness-of-Fit What are the null and alternative hypotheses? H 0 : The distribution of fish is not the same for Green Valley Lake and Echo Lake. The distribution of fish is the same for Green Valley Lake and Echo Lake. Fish breed and lake are independent. Fish breed and lake are dependent. H 1 : Fish breed and lake are dependent. The distribution of fish is the same for Green Valley Lake and Echo Lake. The distribution of fish is not the same for Green Valley Lake and Echo Lake. Fish breed and lake are independent. The test-statistic for this data = (Please show your answer to three decimal places.) The p-value for this sample = (Please show your answer to four decimal places.) The p-value is α Based on this, we should reject the null fail to reject the null accept the null Thus, the final conclusion is... There is insufficient evidence to conclude that the distribution of fish is not the same for Green Valley Lake and Echo Lake. There is insufficient evidence to conclude that fish breed and lake are independent. There is sufficient evidence to conclude that the distribution of fish is not the same for Green Valley Lake and Echo Lake. There is sufficient evidence to conclude that the distribution of fish is the same for Green Valley Lake and Echo Lake. There is sufficient evidence to conclude that fish breed and lake are dependent.
We need to determine whether frequency counts are distributed identically across different populations. So, the correct statistical test to use is Chi Square test of Homogeneity.
The appropriate hypothesis are
H0: The distribution of fish is the same for Green Valley Lake and Echo Lake.
H1: The distribution of fish is not the same for Green Valley Lake and Echo Lake.
The contingency table is,
rainbow trout | other trout | bass | catfish | Total | |
Green Valley Lake | 206 | 67 | 45 | 67 | 385 |
Echo Lake | 248 | 108 | 81 | 82 | 519 |
Total | 454 | 175 | 126 | 149 | 904 |
The test statistic is a chi-square random variable (Χ2) defined by the following equation.
Χ2 = Σ [ (Or,c - Er,c)2 / Er,c ]
where Or,c is the observed frequency count Er,c is the expected frequency count for row r and column c.
Er,c = (nr * nc) / n
E1,1 = (454 * 385) / 904 = 193.3518
E1,2 = (175 * 385) / 904 = 74.52987
E1,3 = (126 * 385) / 904 = 53.6615
E1,4 = (149 * 385) / 904 = 63.45686
E2,1 = (454 * 519) / 904 = 260.6482
E2,2 = (175 * 519) / 904 = 100.4701
E2,3 = (126 * 519) / 904 = 72.3385
E2,4 = (149 * 519) / 904 = 85.54314
Χ2 = (206 - 193.3518)2 / 193.3518 + (67 - 74.52987)2 / 74.52987 + (45 - 53.6615)2 / 53.6615 + (67 - 63.45686)2 / 63.45686 + (248 - 260.6482)2 / 260.6482 + (108 - 100.4701)2 / 100.4701 + (81 - 72.3385)2 / 72.3385 + (82 - 85.54314)2 / 85.54314
= 5.546
Degree of freedom = (r - 1) * (c - 1) = (2 - 1) * (4 - 1) = 3
P-value = P(Χ2 > 5.546, df = 3) = 0.1359
The p-value is greater than α
Based on this, we should fail to reject the null.
Thus, the final conclusion is - There is insufficient evidence to conclude that the distribution of fish is not the same for Green Valley Lake and Echo Lake.