In: Statistics and Probability
In the dataset airways (see R code), we have the change in airflow from moderate exercise for 19 subjects under 2 different exposure conditions – regular air (air) and 0.25% sulpher dioxide (so2).
a) Look at the correlation, and use the t-table to test the null hypothesis that air flow change under these two conditions is uncorrelated. Test at significance level 0.05. Show your work.
b) Use a linear model and the summary function in R to test the null hypothesis that air flow changes are uncorrelated among people at the alpha = 0.05.
c) Look at the scatterplot of the airway changes. Does a linear model fully explain the relationship between these variables? Why or why not?
R codes:
##################
subject = 1:19
air = c(0.82,0.86,1.86,1.64,12.57,1.56,1.28,1.08,4.29,1.37,14.68,3.64,3.89,0.58,9.50,0.93,0.49,31.04,1.66)
so2 = c(0.72,1.05,1.40,2.30,13.49,0.62,2.41,2.32,8.19,6.33,19.88,8.87,9.25,6.59,2.17,9.93,13.44,16.25,19.89)
airways = data.frame(subject,air,so2)
airways
# part a, get the correlation
cor(air,so2)
# part b, make a model and get the summary
fit = lm(air ~ so2, data=airways)
summary(fit)
## part c, to see a scatterplot...
plot(air ~ so2, data=airways)
After running the given R code we get the following output:
This is the data frame.
this is the correlation between (air, so2)
From this correlation we need to test the null hypothesis that air flow change under these two conditions is uncorrelated by using t-table.
Degrees of freedom = df = n - 2 = 19 - 2 = 17
Null hypothesis : There is no correlation between air and so2
Alternative hypothesis : There is correlation between air and so2
Mathematically,
The formula of t test statistic is as follow:
r = 5122518
n = 2
test statistic value = t = 2.459
Let's find critical value
For two tailed test
tc = "TINV(0.05,17) = 2.110 ...{ excel command)
Decision rule:
1) If critical value < calculated value R then we reject null hypothesis.
2) If critical value > calculated value R then we fail to reject null hypothesis.
here critical value = 2.110 < 2.459
So we used first rule
that is we reject the null hypothesis
Conclusion: At 5% level of significance we say that there is correlation between air and so2.
The next output is as follow:
b) Use a linear model and the summary function in R to test the null hypothesis that air flow changes are uncorrelated among people at the alpha = 0.05.
From the summary output the p-value corresponding to the F statistic is 0.02494
Decision rule: 1) If p-value < level of significance (alpha) then we reject null hypothesis
2) If p-value > level of significance (alpha) then we fail to reject null hypothesis.
Here p value = 0.02494 < 0.05 so we used first rule.
That is we reject null hypothesis and accept alternative hypothesis.
Conclusion: At 5% level of significance we say that there is correlation between air and so2.
c) Look at the scatterplot of the airway changes. Does a linear model fully explain the relationship between these variables? Why or why not?
Look the following R output of scatter plot
No, the linear model does not fully explain the relationship between these variables because the point points are not close to the linear line.