In: Statistics and Probability
Using all the data below, construct an empirical model using a computational tool (matlab, or R, any preferred). explain your model.
Data Description: These data are from a NIST study involving calibration of ozone monitors. The response variable (y) is the customer's measurement of ozone concentration and the predictor variable (x) is NIST's measurement of ozone concentration. MATLAB Row Vectors: xLst = [0.2, 337.4, 118.2, 884.6, 10.1, 226.5, 666.3, 996.3, 448.6, 777.0, 558.2, 0.4, 0.6, 775.5, 666.9, 338.0, 447.5, 11.6, 556.0, 228.1, 995.8, 887.6, 120.2, 0.3, 0.3, 556.8, 339.1, 887.2, 999.0, 779.0, 11.1, 118.3, 229.2, 669.1, 448.9, 0.5];
yLst = [0.1, 338.8, 118.1, 888.0, 9.2, 228.1, 668.5, 998.5, 449.1, 778.9, 559.2, 0.3, 0.1, 778.1, 668.8, 339.3, 448.9, 10.8, 557.7, 228.3, 998.0, 888.8, 119.6, 0.3, 0.6, 557.6, 339.3, 888.0, 998.5, 778.9, 10.2, 117.6, 228.9, 668.4, 449.2, 0.2];
Sol:
Fit a linear regression model of y on x
R code is;
x <- c(0.2, 337.4, 118.2, 884.6, 10.1, 226.5, 666.3, 996.3,
448.6,
777.0, 558.2, 0.4, 0.6, 775.5, 666.9, 338.0, 447.5, 11.6,
556.0,
228.1, 995.8, 887.6, 120.2, 0.3, 0.3, 556.8, 339.1, 887.2,
999.0, 779.0, 11.1, 118.3, 229.2, 669.1, 448.9, 0.5)
y <- c(0.1, 338.8, 118.1, 888.0, 9.2, 228.1, 668.5, 998.5,
449.1, 778.9, 559.2,
0.3, 0.1, 778.1, 668.8, 339.3, 448.9,
10.8, 557.7, 228.3, 998.0, 888.8, 119.6, 0.3,
0.6, 557.6, 339.3, 888.0, 998.5, 778.9, 10.2, 117.6,
228.9, 668.4, 449.2, 0.2)
regmod <- lm(y~x)
summary(regmod)
Output:
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-2.35238 -0.53270 -0.02963 0.60003 1.78979
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.2623231 0.2328182 -1.127 0.268
x 1.0021168 0.0004298 2331.606 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.8848 on 34 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 5.436e+06 on 1 and 34 DF, p-value: < 2.2e-16
Intrepretation:
Linear regression model for the given set of data is
y=-0.2623231 + 1.0021168*x
y intercept=-0.2623231
slope=1.0021168
R sq=1
100% variance in y is explained by x.
Good model.