In: Statistics and Probability
Use the CO2 data and via Multiple regression select the two variables that predict the CO2 level with the best P-value. Make another table with these two variables and answer the questions. Numerical answers are rounded so choose the answer that matches the best:
Hour CO Traffic Wind
1 2.4 50 -0.2
2 1.7 26 0.0
3 1.4 16 0.0
4 1.2 10 0.0
5 1.2 12 0.1
6 2.0 41 -0.1
7 3.4 157 -0.1
8 5.8 276 -0.2
9 6.8 282 0.2
10 6.6 242 1.0
11 6.3 200 2.3
12 5.8 186 3.8
13 5.5 179 4.6
14 5.9 178 5.4
15 6.8 203 5.9
16 7.0 264 5.9
17 7.4 289 5.6
18 7.4 308 4.9
19 6.4 267 3.8
20 5.0 190 2.5
21 3.8 125 1.4
22 3.5 120 0.6
23 3.3 116 0.4
24 3.1 87 0.1
Answer the questions for Assessment:
13. What are the two selected variables? a. Hour and Traffic b. Wind and Traffic c. Hour and Wind
14. Which of the variables has a better P-value and what is this P-value? (Note: numbers are truncated.) a. Traffic; 0.018 b. Traffic; 6.85E-12 c. Wind; 0.0056 d. Wind; 0.174
15. Based on the table, how would you characterize the Regression fit? a. Poor b. Good c. Excellent
16. What is another name for the coefficient 1.274461 and what is its interpretation based on the data? a. The X-intercept; when the average weekday traffic density and the perpendicular wind-speed component are zero.
b. The slope of an average summer weekday's CO2 concentration. It is how much the CO2 concentration will increase when both the average weekday traffic density and the perpendicular wind-speed component increase by 1 unit.
c. The Y-intercept; It is how much the CO2 concentration will increase when both the average weekday traffic density and the perpendicular wind-speed component are zero.
d. The Y-intercept; it is the average summer weekday CO2 concentration when the average weekday traffic density and the perpendicular wind-speed component are zero.
e. None of these
I used R software to solve this question.
R codes and output:
d=read.table('co2.csv',header=TRUE, sep=',')
> head(d)
Hour CO Traffic Wind
1 1 2.4 50 -0.2
2 2 1.7 26 0.0
3 3 1.4 16 0.0
4 4 1.2 10 0.0
5 5 1.2 12 0.1
6 6 2.0 41 -0.1
> attach(d)
> fit=lm(CO~Hour+Traffic+Wind)
> summary(fit)
Call:
lm(formula = CO ~ Hour + Traffic + Wind)
Residuals:
Min 1Q Median 3Q Max
-0.67461 -0.31714 -0.04293 0.26546 0.91662
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.305220 0.206791 6.312 3.68e-06 ***
Hour -0.005669 0.014552 -0.390 0.700976
Traffic 0.018040 0.001205 14.976 2.48e-12 ***
Wind 0.231560 0.050749 4.563 0.000189 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4346 on 20 degrees of freedom
Multiple R-squared: 0.9648, Adjusted R-squared: 0.9595
F-statistic: 182.9 on 3 and 20 DF, p-value: 1.056e-14
Que.13
Wind and Traffic
Because their p-values are less than 0.05.
Que.14
Traffic; 6.85E-12
Que.15
Excellent
Because test for overall significance is F-statistic: 182.9 on 3 and 20 DF, p-value: 1.056e-14
The p-value is less than 0.05, hence model is statistically significant. and it will explain 96.48% variation in CO2.
Que.16
The Y-intercept; it is the average summer weekday CO2 concentration when the average weekday traffic density and the perpendicular wind-speed component are zero.