In: Economics
Use student-t to derive the following paired test
set.seed(20)
girls_height<- rnorm(50, mean = 160, sd = 30)
boys_height<- rnorm(50, mean = 172, sd = 30)
#-------------------------------------------------------------
State the hypothesis for the paired-test of the mean of
# girls' height and boys' height.
Check the three assumptions before running the student t
test
# for the samples. You need to write and run the code, and
# report the results.
The required commands generate the required variables. The null hypothesis of the paired t-test of the mean of two groups would be and the alternate would be , where D is the true/population mean difference.
The assumptions and their verification are as below (note that there are different assumptions in different text books, but major four assumptions are stated).
> cor.test(girls_height,boys_height)
Pearson's product-moment correlation
data: girls_height and boys_height
t = 1.2589, df = 48, p-value = 0.2142
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.1047890 0.4354522
sample estimates:
cor
0.1787731
> jarque.bera.test(girls_height)
Jarque Bera Test
data: girls_height
X-squared = 0.68004, df = 2, p-value = 0.7118
> jarque.bera.test(boys_height)
Jarque Bera Test
data: boys_height
X-squared = 0.24858, df = 2, p-value = 0.8831
> grubbs.test(girls_height)
Grubbs test for one outlier
data: girls_height
G = 2.61871, U = 0.85719, p-value = 0.1704
alternative hypothesis: lowest value 73.3084715675764 is an
outlier
> grubbs.test(boys_height)
Grubbs test for one outlier
data: boys_height
G = 2.33549, U = 0.88641, p-value = 0.4174
alternative hypothesis: lowest value 112.95922652111 is an
outlier
As all the assumptions are verified, we may proceed to do the paired t-test. This can be done manually and via R-commands. Both are done as below.
The commands to test the variables directly is as below.
----------------------------------------------------
> t.test(girls_height,boys_height,paired = T)
Paired t-test
data: girls_height and boys_height
t = -4.0983, df = 49, p-value = 0.0001559
alternative hypothesis: true difference in means is not equal to
0
95 percent confidence interval:
-32.60696 -11.15059
sample estimates:
mean of the differences
-21.87877
----------------------------------------------------
As can be seen, the mean difference is significantly different from zero since the t-statistic is significant at 5% or even at 1% alpha. The low p-value of 0.0001559, less than 0.05 or 0.01 states that we may reject the null hypothesis, which states true mean is equal to zero.
The manual method would be as below.
----------------------------------------------------
> d <- girls_height - boys_height
> t <- mean(d)/(sd(d)/(50^0.5))
> abs(t) > qt(0.995,50-1)
[1] TRUE
> (pt(t,50-1))*2
[1] 0.0001559402
----------------------------------------------------
The difference variable is basically , and the t-statistic is , and since the standard error of the difference variable would be , for sd(d) be the sample (not population) standard deviation, we have the t-statistic as . The founded t-statistic is also equal to the direct command t-statistic. As the absolute value of the t-statistic is greater than the critical t at 0.01 alpha, we reject the null that the true difference is zero. The the critical t is . The p-value is also given just as before, which is less than 0.01 alpha.
Hence, the conclusion would be that, the difference between the height of girls and boys for the given data is significantly different.