In: Math
Can you please tell me what more information I need to provide to solve below problem?
1. Find out below using R
Hints: Use iris dataset from R (built in data set in R)
a) Create a new data frame called virginica.versicolor (that only contains these two species)
the command I used:
virginica.versicolor <- iris[iris$Species %in% c("versicolor", "virginica"), ]
b) What is your null hypothesis regarding sepal lengths for the two species (virginica.versicolor) ? And what is your alternate hypothesis?
c) Describe your hypotheses in terms of your test statistic: what would be the t under the null hypothesis, H0, and what would be the statement about t under your alternate hypothesis Ha?
d) Would you do a one- or non-(i.e., two-sided) directional test? Why?
e) Conduct a Student’s t-test using the formula format as follows:
t.test(sepal.length ~ species, data = virginica.versicolor, var.equal = T).
f) Explain what the three different sections do within the t.test() function.
g) Did your function run a one- or non-directional test?
h) What is your t-value? Based on the results of your t-test, what is your conclusion and why?
a) Create data frame virginica.versicolor
Code :
data(iris)
virginica.versicolor <- iris[iris$Species %in% c("versicolor",
"virginica"), ]
b) The null and alternate hypotheses regarding sepal lengths for
the two species is
Ho : sl1 = sl2
Ha : sl1 <> sl2
where sl1, sl2 are the sepal lengths of virginica and versicolor
species respectively
c) We will reject the null hypothesis Ho if the test statistic
|t| > t-critical
It means that the alternate hypotesis is accepted if the test
statistic |t| > t-critical
d) Since we will be testing if there is a difference in the
sepal lengths of the two species
and not testing if one species sepal length is
greater/smaller than the other species sepal length,
we will be using a non-directional, two-sided
test
e) Conduct the Student's T-test
Code :
t.test(Sepal.Length ~ Species, data = virginica.versicolor, var.equal = T)
Code Output
Two Sample t-test data: Sepal.Length by Species t = -5.6292, df = 98, p-value = 1.725e-07 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.8818516 -0.4221484 sample estimates: mean in group versicolor mean in group virginica 5.936 6.588
f) The three sections indicate as below :
Sepal.Length ~ Species : The numeric data on which the t-test is to be performed, viz. the sepal length grouped by
the species
data = virginica.versicolor : data frame containing the variables
var.equal = T : logical variable indicating the two population variances should be treated as equal.
If TRUE then the pooled variance is used to estimate the variance
g) Since we did not specify the "alternative" variable in the function t.test it ran a non-directional test which is the default test.
h) Based on the output, we get
t = -5.6292, df = 98, p-value = 1.725e-07
Also p-value is very small (1.725e-07)
At 5% level of significance we can see that p-value < 0.05
Hence, we reject the null hypothesis.
Conclusion :
There exists significant statistical evidence to conclude that there is a difference between the sepal lengths of the two species virginica and versicolor.