In: Statistics and Probability
Q.Explain the flaws in the following analysis or conclusion. (Note that this is not about chi-squared test per se, perhaps, it is about what a statistical analysis (hypothesis testing in this case) can tell us and what it cannot) Background - The Scholarship Committee comprises 3 faculty members, one of whom is Professor X. Professor X also wrote recommendation letter for 5 students who took his class earlier and four of them were awarded scholarships (out of a total of six scholarships awarded to graduate students). A student who applied but was not awarded a scholarship accused Professor X of favoritism, claiming that he/she was denied a scholarship despite having a very high GPA because he/she did not take class under Professor X and had declined him/her a recommendation letter. The student then did following statistical analysis as an evidence of favoritism . The student runs a Chisq-test and his result shows not independent between students who took class under Professor X and Students got scholarship. H0: students who get scholarship is independent with whether they have taken class under Professor X H1: not independent # Let total number of students who applied for scholarship is 70, and only 5 people get the scholarship, conservatively estimate only 3 out of 5 students took Class with ProfessorX, the other two selected did not take Class with ProfessorX # chisquare test shows the p-value <0.05, thus we reject H0 at .05 level and conclude students who got scholarship was affected by whether they have class under Professor X #student in addition tests , how about the total applicants were 60, or 80? and find out either case we reject H0, they are not independent.
SOLUTION
Putting the data that the student accumulated into a contingency table:
There were 3 students who took class under Prof. X and got the scholarship and 5 total students who got the scholarship. There were 70 students who applied so that leaves 65 who did not get a scholarship.
I guess the first flaw is evident from the table that we created. The student does not take into account that out of the number of students that did not get scholarships, how many took class under Prof. X and how many did not.
Even if the student did that, the major flaw of the conclusion would be that the chi sq. test of independence requires each of the expected frequency counts in the cells should be at least 5. Expected cell count for any cell is row total * column total / overall total.
So, let us say that the value of the first X i.e. students who did not get a scholarship but took a class under Prof. X is 47.
So, the corresponding column total will be 50.
Hence, the expected frequency count for the first cell will be 5*50/70 = 3.57 which is less than 5
The maximum value this column total can have is 68 because there are 2 students that we definitely know that did not take a class and in this case also, the expected freq of first cell will be less than 5.
Hence, the criteria to run the chi-sq test will never be met and the results stated will not be of statistical significance.
Changing the total applications pool will not change our decision regarding the flaws because the expected frequency criteria will never be met due to the low observation values.