In: Statistics and Probability
install.packages("mosaic")
library(mosaic)
Data=(RailTrail)
RailTrail
above is the data set it can be found in R
(a) Perform multivariate regression model that can predict the variable volume based on the variables hightemp, lowtemp, cloudcover, precip,. Interpret and discuss all the necessary statics from the output.
(b) Test whether cloudcover can be dropped from the regression model given that precipitation, hightemp, and lowtemp are retained. Use the F statistic and level of significance 0.01. State the hypotheses, p-value, and conclusion in terms of the problem. Hint: This can be achieved using ANOVA.
(c) Assess whether both lowtemp and cloudcover can be dropped from the model given that hightemp and precipitation are retained. Discuss your results giving all relevant details to your solution. This includes any graphs or plots.
Solution
a)
install.packages("mosaic")
library(mosaic)
d=RailTrai
attach(d)
model=lm(volume~hightemp+lowtemp+cloudcover+precip,data=d)
summary(model)
#output
Call:
lm(formula = volume ~ hightemp + lowtemp + cloudcover +
precip,
data = d)
Residuals:
Min 1Q Median 3Q Max
-269.447 -37.449 4.186 41.178 299.266
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 35.308 59.796 0.590 0.5564
hightemp 6.571 1.153 5.699 1.7e-07 ***
lowtemp -1.290 1.387 -0.930 0.3551
cloudcover -7.501 3.851 -1.948 0.0547 .
precip -100.616 42.064 -2.392 0.0190 *
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 93.2 on 85 degrees of freedom
Multiple R-squared: 0.4894, Adjusted R-squared:
0.4654
F-statistic: 20.37 on 4 and 85 DF, p-value: 8.537e-12
From Coefficients table
The p-value = 0.5564 >0.01 so we conclude that there is no linear relationship between dependent and independent variable.
Multiple R-squared: 0.4894 Which indicates that 48.94% variation in the dependent variable is explained by independent variable.
The p-value for F statistics 8.537e-12 < 0.01 level of significance which indicates that independent variables are significant.
(b) Test whether cloudcover can be dropped from the regression model given that precipitation, hightemp, and lowtemp are retained.
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 35.308 59.796 0.590 0.5564
hightemp 6.571 1.153 5.699 1.7e-07 ***
lowtemp -1.290 1.387 -0.930 0.3551
cloudcover -7.501 3.851 -1.948 0.0547 .
precip -100.616 42.064 -2.392 0.0190 *
Since the p-value for cloudcover = 0.0547 > 0.01 level of significance so can drop cloudcover from the regression model.
Use the F statistic and level of significance 0.01. State the hypotheses, p-value, and conclusion in terms of the problem
Residual standard error: 93.2 on 85 degrees of freedom
Multiple R-squared: 0.4894, Adjusted R-squared:
0.4654
F-statistic: 20.37 on 4 and 85 DF, p-value: 8.537e-12
State the hypotheses
p-value = 8.537e-12
conclusion : The p-value for F statistics 8.537e-12 < 0.01 level of significance which indicates that independent variables are significant.
(c) Assess whether both lowtemp and cloudcover can be dropped from the model given that hightemp and precipitation are retained. Discuss your results giving all relevant details to your solution. This includes any graphs or plots
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 35.308 59.796 0.590 0.5564
hightemp 6.571 1.153 5.699 1.7e-07 ***
lowtemp -1.290 1.387 -0.930 0.3551
cloudcover -7.501 3.851 -1.948 0.0547 .
precip -100.616 42.064 -2.392 0.0190
The p-values for lowtemp and cloudcover > 0.01 so we can drop both lowtemp and cloudcover can be dropped from the model.