In: Statistics and Probability
2.) You want to determine if the average height of men in California is greater than the average height of men in Nebraska. You take a random sample of 30 men in California and 30 men in Nebraska. The data below represents the heights of the men in inches. Write the R code that does the following:
H0: Difference in means in populations is zero.
Ha: Difference in means in the populations is not zero.
NE_heights<-c( 73.5, 68.5, 70, 63, 64, 65, 64, 70, 61, 61.25, 69, 73, 69, 66, 69.5, 68,
64, 64, 72.5, 69, 67, 63, 66.5, 70.5, 64, 67, 71, 74, 68, 65)
CA_heights <- c( 72, 73.5, 74, 75, 66, 78, 70, 73, 74, 68, 71, 68, 67, 66, 73, 72, 82, 71, 64, 72, 65, 66, 69, 83, 67, 74, 76, 65, 74, 79)
a.) Makes two boxplots, an orange one for the CA_heights data, and a red one for the NE_heights data which labels the main title "Men’s heights California vs Nebraska" and names the CA_heights data as "CA heights" and the NE_heights data as "NE heights".
b.) Computes the, sample size, mean and standard deviation of both CA_heights and NE_heights data.
c.) Performs an unpaired "less" than t-test with =.02 to decide whether there is a statistically significant difference between men’s heights in California and Nebraska.
d.) Paste your R code into Run R Script and run the script.
e.) Paste the R output to the bottom R code.
f.) Looking at the p-value in the R output, decide if there is evidence to suggest that there is a statistically significant difference between men’s heights in California and Nebraska.
Write the p-value and your conclusion at the top of your R code.
> NE_heights<-c( 73.5, 68.5, 70, 63, 64, 65, 64, 70, 61,
61.25, 69, 73, 69, 66, 69.5,
68,64,64,72.5,69,67,63,66.5,70.5,64,67,71,74,68,65)
> CA_heights <- c( 72, 73.5, 74, 75, 66, 78, 70, 73, 74, 68,
71, 68, 67, 66, 73, 72, 82, 71, 64, 72, 65, 66, 69, 83, 67, 74, 76,
65, 74, 79)
> dataframe=data.frame(NE_heights,CA_heights)
> colnames(dataframe)=c("NE_heights","CA_heights")
>
boxplot(dataframe,horizontal=FALSE,las=1,notch=FALSE,outline=TRUE,outcol="orange",outpch=19,
col="Green",xlab="City",ylab="Height",
main="Boxplot",sub="",col.lab="Green",col.main="yellow",col.sub="white",col.axis="red",cex.lab=1,cex.main=1,cex.sub=1,cex.axis=1)
> sample_size_n1=length(NE_heights)
> sample_size_n1
[1] 30
> sample_size_n2=length(CA_heights)
> sample_size_n2
[1] 30
> m1=mean(NE_heights)
> m1
[1] 67.34167
> m2=mean(CA_heights)
> m2
[1] 71.58333
> std_dev_1=sqrt(var(NE_heights))
> std_dev_1
[1] 3.610619
> std_dev_2=sqrt(var(CA_heights))
> std_dev_2
[1] 4.958593
> t.test(NE_heights,CA_heights,var.equal=T,,level=0.98)
Two Sample t-test
data: NE_heights and CA_heights
t = -3.7876, df = 58, p-value = 0.0001818
alternative hypothesis: true difference in means is less than
0
95 percent confidence interval:
-Inf -2.369721
sample estimates:
mean of x mean of y
67.34167 71.58333
conclusion: Here pvalue is less than 2% level of signficance, ie. pvalue=0.0001818<0.02, we may reject null hypothesis at 2% level of significance and conclude that the There is sufficient evidence that the statistically significant difference between men’s heights in California and Nebraska.