Question

In: Statistics and Probability

In the dataset airways (see R code), we have the change in airflow from moderate exercise...

In the dataset airways (see R code), we have the change in airflow from moderate exercise for 19 subjects under 2 different exposure conditions – regular air (air) and 0.25% sulpher dioxide (so2).

a) Look at the correlation, and use the t-table to test the null hypothesis that air flow change under these two conditions is uncorrelated. Test at significance level 0.05. Show your work.

b) Use a linear model and the summary function in R to test the null hypothesis that air flow changes are uncorrelated among people at the alpha = 0.05.

c) Look at the scatterplot of the airway changes. Does a linear model fully explain the relationship between these variables? Why or why not?

R codes:

##################

subject = 1:19

air = c(0.82,0.86,1.86,1.64,12.57,1.56,1.28,1.08,4.29,1.37,14.68,3.64,3.89,0.58,9.50,0.93,0.49,31.04,1.66)

so2 = c(0.72,1.05,1.40,2.30,13.49,0.62,2.41,2.32,8.19,6.33,19.88,8.87,9.25,6.59,2.17,9.93,13.44,16.25,19.89)

airways = data.frame(subject,air,so2)

airways

# part a, get the correlation

cor(air,so2)

# part b, make a model and get the summary

fit = lm(air ~ so2, data=airways)

summary(fit)

## part c, to see a scatterplot...

plot(air ~ so2, data=airways)

Solutions

Expert Solution

After running the given R code we get the following output:

This is the data frame.

this is the correlation between (air, so2)

From this correlation we need to test the null hypothesis that air flow change under these two conditions is uncorrelated by using t-table.

Degrees of freedom = df = n - 2 = 19 - 2 = 17

Null hypothesis : There is no correlation between air and so2

Alternative hypothesis : There is correlation between air and so2

Mathematically,

The formula of t test statistic is as follow:

r = 5122518

n = 2

test statistic value = t = 2.459

Let's find critical value

For two tailed test

tc = "TINV(0.05,17) = 2.110 ...{ excel command)

Decision rule:

1) If critical value < calculated value R then we reject null hypothesis.

2) If critical value > calculated value R then we fail to reject null hypothesis.

here critical value = 2.110 < 2.459

So we used first rule

that is we reject the null hypothesis

Conclusion: At 5% level of significance we say that there is correlation between air and so2.

The next output is as follow:

b) Use a linear model and the summary function in R to test the null hypothesis that air flow changes are uncorrelated among people at the alpha = 0.05.

From the summary output the p-value corresponding to the F statistic is 0.02494

Decision rule: 1) If p-value < level of significance (alpha) then we reject null hypothesis

2) If p-value > level of significance (alpha) then we fail to reject null hypothesis.

Here p value = 0.02494 < 0.05 so we used first rule.

That is we reject null hypothesis and accept alternative hypothesis.

Conclusion: At 5% level of significance we say that there is correlation between air and so2.

c) Look at the scatterplot of the airway changes. Does a linear model fully explain the relationship between these variables? Why or why not?

Look the following R output of scatter plot

No, the linear model does not fully explain the relationship between these variables because the point points are not close to the linear line.


Related Solutions

ANSWER USING R CODE Using the dataset 'LakeHuron' which is a built in R dataset describing...
ANSWER USING R CODE Using the dataset 'LakeHuron' which is a built in R dataset describing the level in feet of Lake Huron from 1872- 1972. To assign the values into an ordinary vector,x, we can do the following 'x <- as.vector(LakeHuron)'. From there, we can access the data easily. Assume the values in X are a random sample from a normal population with distribution X. Also assume the X has an unknown mean and unknown standard deviation. With this...
Python. 5) What will the code below do? (Assume that we have a dataset df with...
Python. 5) What will the code below do? (Assume that we have a dataset df with these two columns named Occupation' and 'Age') df.groupby('Occupation')['Age'].mean() a) It will return the average age per occupation b) It will return an error c) It will return the total age per occupation d) None of the options 6) df.describe() will return basic descriptive statistics only for numerical variables True/False ? 7) Pandas dataframes can be converted into numpy arrays Truse/False ?
can anyone explain the code from R below? spam is a dataset sample <- sample( c(TRUE,...
can anyone explain the code from R below? spam is a dataset sample <- sample( c(TRUE, FALSE), nrow(spam), replace=TRUE) train <- spam[sample,] test <- spam[!sample,] when i run train and test, we got two different datasets that are split from the original spam dataset. But i don't understand the first line of code. not sure why this line of code can split data into two sets.
Work these in R. Using library(resampledata) and the dataset Spruce to conduct a test to see...
Work these in R. Using library(resampledata) and the dataset Spruce to conduct a test to see if the mean difference in how much the seedling grew (in height) over the course of the study under these two treatments are significantly different from each other. Answer the following: a) Set up a hypothesis using appropriate notation. b)Find the value of the observed test statistic using R. c)Compute the P-value of the observed test statistic using a permutation distribution with N= 10^5-1....
R has many build-in dataset. The data mtcars is one of them. The following R code...
R has many build-in dataset. The data mtcars is one of them. The following R code read-in data and save the data to input.                   input <- mtcars[,c("am","cyl","hp","wt")]              Write a few line of R code to conduct a regression analysis with am as the response variable, and              cyl, hp, wt as explanation variables.
Using dataset "PlantGrowth" in R (r code) Construct a 95% confidence interval for the true mean...
Using dataset "PlantGrowth" in R (r code) Construct a 95% confidence interval for the true mean weight. Interpret the confidence interval in in the context of the problem.
Use R.  Provide Solution and R Code within each problem. For this section use the dataset “PlantGrowth”,...
Use R.  Provide Solution and R Code within each problem. For this section use the dataset “PlantGrowth”, available in base R (you do not need to download any packages). a.Construct a 95% confidence interval for the true mean weight. b.Interpret the confidence interval in 1. in the context of the problem. c.Write down the null and alternative hypothesis to determine if the mean weight of the plants is less than 5. d.Conduct a statistical test to determine if the mean weight...
a) Why would exercise influence the R-R interval? Would a change in R-R interval following a...
a) Why would exercise influence the R-R interval? Would a change in R-R interval following a short period of exercise be the same for all people, why/why not? b) Each component of the ECG waveform is associated with a particular activity during the cardiac cycle. Based on your knowledge of the actions of the heart associated with the P, Q, R, S, and T waves in the ECG, and what actions in the heart generate the lub-dub sounds, when would...
a) Why would exercise influence the R-R interval? Would a change in R-R interval following a...
a) Why would exercise influence the R-R interval? Would a change in R-R interval following a short period of exercise be the same for all people, why/why not? b) Each component of the ECG waveform is associated with a particular activity during the cardiac cycle. Based on your knowledge of the actions of the heart associated with the P, Q, R, S, and T waves in the ECG, and what actions in the heart generate the lub-dub sounds, when would...
R code: ## 2. __Basic dplyr exercises__ ## Install the package `fueleconomy` and load the dataset...
R code: ## 2. __Basic dplyr exercises__ ## Install the package `fueleconomy` and load the dataset `vehicles`. Answer the following questions. install.packages("fueleconomy") library(fueleconomy) library(dplyr) library(tidyr) data(vehicles) e. Finally, for the years 1994, 1999, 2004, 2009, and 2014, find the average city mpg of midsize cars for each manufacturer for each year. Use tidyr to transform the resulting output so each manufacturer has one row, and five columns (a column for each year). I have included sample output for the first...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT