In: Statistics and Probability
The Conch Café, located in Gulf Shores, Alabama, features casual lunches with a great view of the Gulf of Mexico. To accommodate the increase in business during the summer vacation season, Fuzzy Conch, the owner, hires a large number of servers as seasonal help. When he interviews a prospective server, he would like to provide data on the amount a server can earn in tips. He believes that the amount of the bill and the number of diners are both related to the amount of the tip. He gathered the following sample information.
Customer | Amount of Tip ($) | Amount of Bill ($) | Diners |
1 | 7.2 | 78.03 | 4 |
2 | 4.5 | 28.23 | 4 |
3 | 1 | 10.65 | 1 |
4 | 2.4 | 19.82 | 3 |
5 | 5 | 28.62 | 3 |
6 | 4.25 | 24.83 | 2 |
7 | 0.5 | 6.25 | 1 |
8 | 6 | 49.2 | 4 |
9 | 5 | 43.26 | 3 |
10 | 4.5 | 46.4 | 4 |
11 | 5.55 | 57.98 | 5 |
12 | 6 | 34.99 | 3 |
13 | 4 | 33.91 | 4 |
14 | 3.35 | 23.06 | 2 |
15 | 0.75 | 4.65 | 1 |
16 | 3.3 | 23.59 | 2 |
17 | 3.5 | 22.3 | 2 |
18 | 3.25 | 32 | 2 |
19 | 5.4 | 50.02 | 4 |
20 | 2.25 | 17.6 | 3 |
21 | 3.2 | 60.06 | 2 |
22 | 3 | 20.27 | 2 |
23 | 1.25 | 19.53 | 2 |
24 | 3.25 | 27.03 | 3 |
25 | 3 | 21.28 | 2 |
26 | 6.25 | 43.38 | 4 |
27 | 5.6 | 28.12 | 4 |
28 | 2.5 | 26.25 | 2 |
29 | 3.95 | 55.38 | 7 |
30 | 5.1 | 62.39 | 6 |
a-1. Develop a multiple regression equation with the amount of tips as the dependent variable and the amount of the bill and the number of diners as independent variables and complete the table. (Negative amounts should be indicated by a minus sign. Round your answers to 3 decimal places.)
a-2. Write out the regression equation. (Negative amounts should be indicated by a minus sign. Round your answers to 3 decimal places.)
a-3. How much does another diner add to the amount of the tips? (Round your answer to 2 decimal places.)
b-1. Complete the ANOVA table. (Leave no cells blank - be certain to enter "0" wherever required. Round "SS, MS" to 3 decimal places and "F" to 2 decimal places.)
b-2. What is your decision regarding the null-hypothesis?
c-1. Conduct an individual test on each of the variables. What is the decision rule at the 0.05 level of significance? (Negative amounts should be indicated by a minus sign. Round your answers to 3 decimal places.) c-2. Which variable should be deleted? Use the equation developed in part (c) to determine the coefficient of determination. (Round your answer to 2 decimal places.) State true or false. From the graph the residuals appear normally distributed. True False Choose the right option from the following graph. The residuals look random.
In order to solve this question I used R software.
R codes and output:
> d=read.table('data.csv',header=T,sep=',')
> head(d)
Customer Amount.of.Tip Amount.of.Bill Diners
1 1 7.20 78.03 4
2 2 4.50 28.23 4
3 3 1.00 10.65 1
4 4 2.40 19.82 3
5 5 5.00 28.62 3
6 6 4.25 24.83 2
> attach(d)
> fit=lm(Amount.of.Tip~Amount.of.Bill+Diners)
> summary(fit)
Call:
lm(formula = Amount.of.Tip ~ Amount.of.Bill + Diners)
Residuals:
Min 1Q Median 3Q Max
-2.21741 -0.64814 0.01558 0.59785 2.08360
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.07751 0.49518 2.176 0.03848 *
Amount.of.Bill 0.05841 0.01696 3.444 0.00189 **
Diners 0.26498 0.21096 1.256 0.21985
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.114 on 27 degrees of freedom
Multiple R-squared: 0.6069, Adjusted R-squared: 0.5777
F-statistic: 20.84 on 2 and 27 DF, p-value: 3.36e-06
> anova(fit)
Analysis of Variance Table
Response: Amount.of.Tip
Df Sum Sq Mean Sq F value Pr(>F)
Amount.of.Bill 1 49.791 49.791 40.1012 8.851e-07 ***
Diners 1 1.959 1.959 1.5777 0.2199
Residuals 27 33.524 1.242
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
a-1
Estimate Std. Error t value Pr(>|t|) (Intercept) 1.07751 0.49518 2.176 0.03848 * Amount.of.Bill 0.05841 0.01696 3.444 0.00189 ** Diners 0.26498 0.21096 1.256 0.21985
a-2
Regression equation:
Amount of tip = 1.078 + 0.058 Amount of Bill + 0.265 Diners
a-3
Another diner add on an average 0.26 to the amount of the tips.
b-1
Analysis of Variance Table Response: Amount.of.Tip Df Sum Sq Mean Sq F value Pr(>F) Amount.of.Bill 1 49.791 49.791 40.10 8.851e-07 *** Diners 1 1.959 1.959 1.58 0.2199 Residuals 27 33.524 1.242
b-2
F-statistic: 20.84 on 2 and 27 DF, p-value: 3.36e-06
Since p-value is less than 0.05, we reject null hypothesis at 5% level of significance and conclude that fitted model is statistically significant.
c-1
If p-value for t test is less than 0.05, then we reject null hypothesis.
For variable Amount of bill,
t = 3.444 and p-value= 0.00189
Since p-value is less than 0.05, hence we conclude that this variable is statistically significant.
For variable Diner,
t= 1.256 and p-value = 0.21985
Since p-value is greater than 0.05, hence we conclude that this variable is not statistically significant.
c-2
Variable dinner should be deleted.
New regression model:
> fit=lm(Amount.of.Tip~Amount.of.Bill)
> summary(fit)
Call:
lm(formula = Amount.of.Tip ~ Amount.of.Bill)
Residuals:
Min 1Q Median 3Q Max
-2.60347 -0.72432 0.06524 0.47407 2.15622
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.36630 0.44307 3.084 0.00456 **
Amount.of.Bill 0.07388 0.01179 6.268 8.92e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.126 on 28 degrees of freedom
Multiple R-squared: 0.5839, Adjusted R-squared: 0.569
F-statistic: 39.29 on 1 and 28 DF, p-value: 8.92e-07
Coefficient of determination = 0.5839 = 58.39%
c-3
Residual plot shows pattern, hence residuals are not random. And from normal probability plot we see that many points lie away from straight line hence residuals are not normal