In: Statistics and Probability
Consider the following data sample from the Consumer Reports Restaurant Satisfaction Survey where variable Type indicates whether the restaurant is Italian or a Seafood/steakhouse restaurant. Price indicates average amount paid per person for dinner and drinks. Score reflects diner’s overall satisfaction, with higher values indicating greater satisfaction (100 is completely satisfied). A regression analysis is conducted using several steps to gauge the impact of the explanatory variables on Score (diner’ satisfaction).
Restaurant |
Price ($) |
Score |
Type |
Bertucci's |
16 |
77 |
Italian |
Black Angus |
24 |
79 |
Seafood/Steak |
Bonefish Grill |
26 |
85 |
Seafood/Steak |
Bravo!cuccina italiana |
18 |
84 |
Italian |
Buca di Beppo |
17 |
81 |
Italian |
Bugaboo Steak House |
18 |
77 |
Seafood/Steak |
Carrabba's Italian grill |
23 |
86 |
Italian |
Brown's Steakhouse |
17 |
75 |
Seafood/Steak |
Il Fornaio |
28 |
83 |
Italian |
Joe's crab Shack |
15 |
71 |
Seafood/Steak |
Johnny Carino's Italian |
17 |
81 |
Italian |
Lone Star SteakHouse |
17 |
76 |
Seafood/Steak |
Longhorn steakhouse |
19 |
81 |
Seafood/Steak |
Maggio's little Italy |
22 |
83 |
Italian |
McGrath's Fish House |
16 |
81 |
Seafood/Steak |
Oliven Graden |
19 |
79 |
Italian |
Outback Steakhouse |
20 |
82 |
Italian |
Red Lobster |
18 |
81 |
Seafood/Steak |
Romano's macorroni grill |
18 |
82 |
Italian |
The old spaguetti factory |
12 |
79 |
Italian |
Uno Chicago Grill |
16 |
80 |
Italian |
MODEL 2 – Include the dummy variable Dtype which takes value 1 if Italian restaurant, 0 otherwise
Fully explain here:
Fully explain here:
Fully explain here:
here:
In order to solve this question I used R software.
R codes and output:
> d=read.table('data.csv',header=T,sep=',')
> head(d)
Price Score Type
1 16 77 1
2 24 79 0
3 26 85 0
4 18 84 1
5 17 81 1
6 18 77 0
> attach(d)
The following objects are masked from d (pos = 3):
Price, Score, Type
> fit=lm(Score~Price+Type)
> summary(fit)
Call:
lm(formula = Score ~ Price + Type)
Residuals:
Min 1Q Median 3Q Max
-5.4202 -2.1048 0.0581 2.4145 4.0592
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 68.6126 3.0505 22.492 1.26e-14 ***
Price 0.5205 0.1546 3.367 0.00344 **
Type 3.0011 1.1661 2.574 0.01913 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.644 on 18 degrees of freedom
Multiple R-squared: 0.4976, Adjusted R-squared: 0.4418
F-statistic: 8.915 on 2 and 18 DF, p-value: 0.002038
> fit2=lm(Score~Price)
> summary(fit2)
Call:
lm(formula = Score ~ Price)
Residuals:
Min 1Q Median 3Q Max
-7.146 -1.875 1.230 1.818 4.301
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 70.3828 3.3833 20.803 1.56e-14 ***
Price 0.5176 0.1760 2.941 0.00839 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.01 on 19 degrees of freedom
Multiple R-squared: 0.3128, Adjusted R-squared: 0.2766
F-statistic: 8.648 on 1 and 19 DF, p-value: 0.008391
a.
F test statistic = 8.915
p-value = 0.0020
Since p-value is less than 0.05, we conclude that model is statistically significant or it fits the given data.
b.
If p-value for each coefficient is less than 0.05, then that variable is statistically significant.
Here we see that both the variables price and type of restaurant have p-value less than 0.05, hence both variables are statistically significant.
c.
Coefficient of determination for the model without type variable is 31.28% and with type variable is 49.76%. It means adding type variable in regression model will explain 18.48% more variation in the dependent variable score. Hence type of restaurant is an important variable.
d.
Estimated regression equation is,
Score = 68.6126 + 0.5205 Price + 3.0011 Type