In: Statistics and Probability
A radio station trying to determine what kind of music to play takes a simple random sample of 50 students at each of three locations: a local middle school, a high school, and a college. The students are asked to choose which of three different music genres they most enjoy hearing on the radio. Here are the results: (12 points). Music Genre Hip-hop Alternative Post-rock Total Age Level Middle-school 28 18 4 50 High-School 22 22 6 50 College 16 20 14 50 Total 66 60 24 150 (a) In the table below, provide the appropriate conditional distributions based on the data collected for comparing the music-listening preferences of the three age levels, based on the data above. (2 points) Music Genre Hip-hop Alternative Post-rock Age Level Middle-school High-School College (b) Perform the appropriate statistical test to determine if there is a difference in the music preference of these three age groups, remember to do all 4 parts in a significance test. (10 points). (c) If you chose a chi-square test for homogeneity in part (b), explain how the data could have been obtained to make a chi-square test for independence appropriate. If you chose a test for independence, explain how the data could have been obtained to make a test for homogeneity appropriate.
(a) Based on the given data, the distribution of frequencies caan be cross -tabulated as follows:
Local middle school | High school | College | Total | |
Hip-hop | 28 | 18 | 4 | 50 |
Alternative | 22 | 22 | 6 | 50 |
Post-rock | 16 | 20 | 14 | 50 |
Total | 66 | 60 | 24 | 150 |
Let H = Students listening to Hip Hop music A = Students listening to Alternative music P = Students listening to Post-rock music L = Students from Local middle school HS = Students from High school C = Students from College
By definition of conditional probability,
%
About 42.42% of the students at Lower middle school listen to Hip Hop.
%
About 58.3% of the college students listen to Post rock.
%
About 44% of the students who listen to Alternative music are from High school.
Similarly, other conditional probabilities can be computed using the cell frequencies and row and column (marginal) totals.
(b)
To test:
H0: There is no association between Music genre and Age vs Ha: There is a significant association between Music genre and Age
The appropriate statistical test to test the above hypothesis would be a Chi square test of association, with the test statistic given by:
where Oi = Observed frequencies and Ei = Expected frequencies
With rejection region for the two tailed test given by,
For (r-1)(c-1) = (3-1)(3-1) = 4 degrees of freedom at :
We may reject the null if
In order to compute the expected frequencies Ei,
According to null, the No. of students of three age groups would be equal across all three categories of music genres., i.e. we would expect equal frequency across all the categories:
Computing the Expected frequencies:
Expected | Local middle school | High school | College |
Hip-hop | (66)(50) / 150 = 22 | (60)(50) / 150 = 20 | (24)(50) / 150 = 8 |
Alternative | (66)(50) / 150 = 22 | (60)(50) / 150 = 20 | (24)(50) / 150 = 8 |
Post-rock | (66)(50) / 150 = 22 | (60)(50) / 150 = 20 | (24)(50) / 150 = 8 |
Computing the test statistic:
= 10.673
Since, lies in the rejection region, we may reject the null at .
We may conclude that the data provides sufficient evidence to support the claim that there is a significant association between Age of the student and music genre prefered.
If we were to use Chi square test for homogeneity, where, we would be testing equality of proportion across n categories:
Suppose we would like to test if the proportion of students who listen to hip hop is equal for those studying at Local middle school. Say, if denote the proportion of students who listen to hip hop is equal for those studying at Local middle school,
We have to test:
Vs Ha: Not all proportions are equal
Where, the sample proportions are obtained as:
Similarly we may test for Alternative and Post - rock music.