In: Statistics and Probability
Question 4 | |||||||||||
Using the dataset below, estimate whether there is a difference among the graduation rates - identified as percentages - of five high schools over a 10-year period. Explain your results. | |||||||||||
Yr | HS 1 | HS 2 | HS 3 | HS 4 | HS 5 | ||||||
2003 | 67 | 82 | 94 | 65 | 88 | ||||||
2004 | 68 | 87 | 78 | 65 | 87 | ||||||
2005 | 65 | 83 | 81 | 45 | 86 | ||||||
2006 | 68 | 73 | 76 | 57 | 88 | ||||||
2007 | 67 | 77 | 75 | 68 | 89 | ||||||
2008 | 71 | 74 | 81 | 76 | 87 | ||||||
2009 | 78 | 76 | 79 | 77 | 81 | ||||||
2010 | 76 | 78 | 89 | 72 | 78 | ||||||
2011 | 72 | 76 | 76 | 69 | 89 | ||||||
2012 | 77 | 86 | 77 | 58 | 87 |
This is a simple problem related to hypothesis testing of 5 different sample means ()
The hypothesis model will be
We shall use the Tukey's HSD test to analyse the hypothesis .
But before that let us conduct the one way ANOVA test to find if the overall F statistic is significant or not .
This will tell us about the direction of solution .Post ANOVA we shall proceed to Tukey's test to compare pairwise means of samples.
This is done as follows.
The p-value corrresponding to the F-statistic of one-way ANOVA is lower than 0.01 which strongly suggests that one or more pairs of treatments are significantly different.
Now we know that our hypothesis is significant on overall levels .
We now delve into finer details (if there is siginificant difference then which all pairs contribute to it?)
This is here that Tukey's HSD test will be used.
we shall apply Tukey's HSD test to each of the 10 pairs to pinpoint and identify as to which of them exhibits statistially significant difference?
To do so we shall firstly establish the critical value of the Tukey-Kramer HSD Q statistic based on the k=5 treatments
and ν=45 degrees of freedom for the error term,
and for significance level α= 0.01 and 0.05 (p-values) .
I took help from online statistical calculators to get these two values.
We obtain these critical values for Q, for α of 0.01 and 0.05, as
Q(α=0.01,k=5,ν=45) = 4.8928 and
Q(α=0.05,k=5,ν=45)= 4.0186, respectively.
Next, we establish a Tukey test statistic from our sample columns to compare with the appropriate critical value obtained.
We calculate a parameter for each pair of columns being compared, which we loosely call here as the Tukey-Kramer HSD Q-statistic, or simply the Tukey HSD Q-statistic, and is given as:
Now please note that here the sample sizes in the columns are equal,and hence their harmonic mean is simply the common sample size.
Also kindly note that the quantity = 6.2084 is the square root of the Mean Square Error = 38.5444 determined in the precursor one-way ANOVA procedure .
Now we are sorted !!!
WE now only need to find the respective Tukey HSD Q-statistic, for each of the 10 pairs and compare them to the critical Q statistical value found above for various confidence levels as shared below again.
We find the corresponding p value from the p value table for each of the paired samplewise Q statisitc.
I attach below color coded results (red for insignificant, green for significant) of evaluating whether Qi,j>Qcritical for all relevant pairs of treatments. The corresponding p values (observed Vs critical) are also attached.
The results will be
.