In: Statistics and Probability
You are interested in identifying SNPs associated with diabetes. You conduct a case-control study using 500,000 SNPs. The allele frequencies of one of the SNPs between cases and controls are listed below:
Cases | Controls | |
G | 2,300 | 4,000 |
A | 3,500 | 3,360 |
Using a chi-square test, determine if there is a significant association of this SNP with diabetes. Show your work. Include the chi-square statistic, the degrees of freedom, and the p-value.
Given your p-value, does this SNP have a significant association diabetes?
To conduct Chi square test we first compute the expected frequencies:
In general, the expected frequency for a cell in the ith row and the jth column is equal to
where Ei,j is the expected frequency for cell i,j, Ti is the total for the ith row, Tj is the total for the jth column, and T is the total number of observations.
The expected frequencies are given in the table below ( the bracket value shows the expected frequencies)
The chi- square statistics is given by :
= 81.81 + 64.47 +75.13 + 59.20
= 280.606
The P-value for the chi-square test is P(>X²), the probability of observing a value at least as extreme as the test statistic for a chi-square distribution with (r-1)(c-1) degrees of freedom.
The p -value for χ2 = 280.606 , for df = (2-1) (2-1) = 1
P(χ2 > 280.606) = < 0.00001
Since p-value < 0.05, hence null hypothesis is rejected ate 5% significant level.
Thus we can conclude that SNP have a significant association diabetes.