In: Statistics and Probability
1) The owner of a restaurant with a large outdoor patio has noticed that sales of cold beverages on the patio increase on hotter days. The owner would like to develop a predictive model for sales as a function of temperature to use for short term scheduling. To develop the model, the owner collects data for 21 days. The data appear in the Beverage worksheet of the HW7 data workbook on Moodle.
a) Draw a scatter plot of the data using Sales as the Y variable and Temperature as the X variable. Does there appear to be a linear association between a month’s electric consumption and average daily temperature?
b) Fit the simple linear regression model using Sales as the dependent or Y variable and Temperature as the dependent or X variable. Is the model significant at = 0.05?
c) What percentage of the variation in Beverage Sales has been explained by the linear relationship with Temperature?
d) Draw a scatter plot of the residuals versus Temperature. Do the residuals Satisfy the assumption that they are independent of Temperature?
f) Fit a multiple regression model using Beverage Sales as the Y or dependent variable with independent variables of Temperature, and the (Temperature)^2 quadratic term. Is the model significant at = 0.05? Are both independent variables significant at = 0.05?
g) Using the model in part f), estimate the change in Beverage Sales when the Temperature increases from 75 to 80 and from 85 to 90.
Temperature | Sales |
85 | $ 1,810 |
90 | $ 4,825 |
79 | $ 438 |
82 | $ 775 |
84 | $ 1,213 |
96 | $ 8,692 |
88 | $ 2,356 |
76 | $ 266 |
93 | $ 4,930 |
97 | $ 9,138 |
89 | $ 2,714 |
83 | $ 1,082 |
85 | $ 1,290 |
90 | $ 3,970 |
82 | $ 894 |
91 | $ 2,906 |
90 | $ 4,615 |
84 | $ 1,168 |
79 | $ 462 |
81 | $ 1,018 |
95 | $ 5,950 |
a)
there is a linear relationship because the correlation coefficient is = 0.9224
b)Y = -32511.2467 + 408.6026 X1
Source |
DF |
Sum of Squares | Mean Square | F Statistic | P-value |
---|---|---|---|---|---|
Regression (between ŷi andyibarbar) |
1 |
117362193.5706 |
117362193.5706 |
108.2876 |
2.761e-9 |
Residual (between yi and ŷi) |
19 |
20592209.6675 |
1083800.5088 |
||
Total(between yi andyibar) |
20 |
137954403.2381 |
6897720.1619 |
Coeff |
SE | t-stat | lower t0.025(19) | upper t0.975(19) |
Stand Coeff |
p-value |
VIF |
|
---|---|---|---|---|---|---|---|---|
b | -32511.2467 | 3408.7235 | -9.5377 | -39645.7869 | -25376.7065 | 0.000 | 1.122e-8 | |
X1 | 408.6026 | 39.2656 | 10.4061 | 326.4189 | 490.7864 | 0.9224 | 2.761e-9 | 1.0000 |
R square (R2) equals 0.8507. It means
that the predictors (Xi) explain 85.1% of the variance
of Y.
Adjusted R square equals 0.8429.
The coefficient of correlation (R) equals 0.9224.
It means that there is a very strong direct relationship between
the predicted data (ŷ) and the observed data (y).
Overall regression: right-tailed, F(1,19) = 108.2876,
p-value = 2.761e-9. Since p-value < α (0.05), we reject the
H0.
The linear regression model, Y = b0+
b1X1 provides a better fit
All the independent variables (Xi) are significant.
The Y-intercept (b): two-tailed, T = -9.5377, p-value = 1.122e-8.
Hence b is significantly different from zero.
c) R^2 = 0.8507 means 85.07 percentage of the variation in Beverage Sales has been explained by the linear relationship with Temperature?
d)Residual
(between yi and ŷi)
f)
y = 23.30035581 x2 - 3643.171723 x + 142850.3406
Regression Polynomial: | y=23.3004x2−3643.1717x+142850.3406 |
R-squared: | r2=0.9474 |
Adjusted R-squared: | r2adj=0.9446 |
Residual Standard Error: | 635.1365 on 1818 degrees of freedom |
Coefficient | Estimate | Standard Error | tt-statistic | pp-value |
intercept | 142850.3406 | 30575.7015 | 4.672 | 0.0002 |
β1 | -3643.1717 | 705.2304 | -5.1659 | 0.0001 |
β2 | 23.3004 | 4.0532 | 5.7486 | 0 |
Analysis of Variance Table
Source | df | SS | MS | FF-statistic | pp-value |
Regression | 2 | 130693232.2314 | 65346616.1157 | 161.9903 | 0 |
Residual Error | 18 | 7261171.0068 | 403398.3893 | ||
Total | 20 | 137954403.2381 | 6897720.1619 |
R square (R2) equals 0.9474. It means
that the predictors (Xi) explain 85.1% of the variance
of Y.
Adjusted R square equals 0.9446
The coefficient of correlation (R) equals 0.9733.
It means that there is a very very strong direct relationship
between the predicted data (ŷ) and the observed data (y).
Overall regression: right-tailed, F(1,19) =1,
p-value=0.000 Since p-value < α (0.05), we reject the
H0.
The quadric regression model provides a better fit
All the independent variables (Xi) are significant.
The Y-intercept (b): two-tailed, T =4.672, p-value = 0.002 Hence
intercept is significantly different from zero.
everything is significant and best fit than the simple linear regression model
g) y = 23.30035581 x2 - 3643.171723 x + 142850.3406
y = 23.30035581 (75)^2 - 3643.171723 *75 + 142850.3406
y = 676.962
y = 23.30035581 (80)^2 - 3643.171723 *80 + 142850.3406
y = 518.8799
change = 676.962- 518.8799 = 158.0821
y = 23.30035581 (85)^2 - 3643.171723 *85 + 142850.3406
y = 1525.81
y = 23.30035581 (90)^2 - 3643.171723 *90 + 142850.3406
y = 3697.77
change = 3697.77- 1525.81 = 2171.96