In: Statistics and Probability
Here is the R code for running a t-test:
t.test( numeric vector of data values, another optional numeric vector of data values,
alternative = c("two.sided", "less", "greater"),
mu = Ho, paired = c(TRUE, FALSE), var.equal = c(TRUE,FALSE),conf.level =1-)
2)
You want to determine if the average height of men in California is greater than the average height of men in Nebraska. You take a random sample of 30 men in California and 30 men in Nebraska. The data below represents the heights of the men in inches. Write the R code that does the following:
H0: Difference in means in populations is zero.
Ha: Difference in means in the populations is not zero.
NE_heights<-c( 73.5, 68.5, 70, 63, 64, 65, 64, 70, 61, 61.25, 69, 73, 69, 66, 69.5, 68,
64, 64, 72.5, 69, 67, 63, 66.5, 70.5, 64, 67, 71, 74, 68, 65)
CA_heights <- c( 72, 73.5, 74, 75, 66, 78, 70, 73, 74, 68, 71, 68, 67, 66, 73, 72, 82, 71, 64, 72, 65, 66, 69, 83, 67, 74, 76, 65, 74, 79)
a.) Makes two boxplots, an orange one for the CA_heights data and a red one for the NE_heights data which labels the main title "Men’s heights California vs Nebraska" and names the CA_heights data as "CA heights" and the NE_heights data as "NE heights".
b.) Computes the, sample size, mean and standard deviation of both CA_heights and NE_heights data.
c.) Performs an unpaired "less" than t-test with =.02 to decide whether there is a statistically significant difference between men’s heights in California and Nebraska.
d.) Paste your R code into Run R Script and run the script.
e.) Paste the R output to the bottom R code.
f.) Looking at the p-value in the R output, decide if there is evidence to suggest that there is a statistically significant difference between men’s heights in California and Nebraska.
Write the p-value and your conclusion at the top of your R code.
> NE_heights<-c( 73.5, 68.5, 70, 63, 64, 65, 64, 70, 61,
61.25, 69, 73, 69, 66, 69.5,
68,64,64,72.5,69,67,63,66.5,70.5,64,67,71,74,68,65)
> CA_heights <- c( 72, 73.5, 74, 75, 66, 78, 70, 73, 74, 68,
71, 68, 67, 66, 73, 72, 82, 71, 64, 72, 65, 66, 69, 83, 67, 74, 76,
65, 74, 79)
> dataframe=data.frame(NE_heights,CA_heights)
> colnames(dataframe)=c("NE_heights","CA_heights")
>
boxplot(dataframe,horizontal=FALSE,las=1,notch=FALSE,outline=TRUE,outcol="#000000",outpch=19,
col="#000000",xlab="",ylab="",
main="",sub="",col.lab="#000000",col.main="#000000",col.sub="#000000",col.axis="#000000",cex.lab=1,cex.main=1,cex.sub=1,cex.axis=1)
(unable paste the box plot)
> sample_size_n1
[1] 30
> sample_size_n2=length(CA_heights)
> sample_size_n2
[1] 30
> m1=mean(NE_heights)
> m1
[1] 67.34167
> m2=mean(CA_heights)
> m2
[1] 71.58333
> std_dev_1=sqrt(var(NE_heights))
> std_dev_1
[1] 3.610619
> std_dev_2=sqrt(var(CA_heights))
> std_dev_2
[1] 4.958593
> t.test(NE_heights,CA_heights,var.equal=T,,level=0.98)
Two Sample t-test
data: NE_heights and
CA_heights
t = -3.7876, df = 58, p-value = 0.0001818
alternative hypothesis: true difference in means is less than
0
95 percent confidence interval:
-Inf -2.369721
sample estimates:
mean of x mean of y
67.34167 71.58333
Conclusion: Here p-value is significant i,e p-value= 0.0001818< 0.02 level of significance, we may reject null hypothesis at 2% level of significance, and conclude that there is sufficient evidence that the difference in the mean of two heights population is significant.