In: Statistics and Probability
Collect data on one response (dependent or y) variable and two different explanatory (independent or x) variables. This will require a survey with three questions. For example: To predict a student’s GPA (y), you might collect data on two x variables: SAT score and age. So we would be trying to determine if there was a linear correlation between someone’s SAT score and their GPA, as well as their age and their GPA. (Note: students may not choose GPA as their dependent variable, must pick a different topic.)
• This data must be quantitative, not qualitative.
• Collect data from at least 15 people. Each person must answer all three questions for their data to count.
• Prepare a brief report that shares the questions used, as to why they are important to be studied.
• Present data in table form and as a scatter plot. You can create your tables and graphs in Excel, but they will need to be copy and pasted into your Word document. Do NOT submit an Excel file as it will not be graded.
• Model the data with two linear regressions (one for each x & y pair.)
• Interpret each linear model.
• Use each of your models to make a prediction.
Consider the following dataset that quantifies Restaurant Score, the Food Quality Score and the Service score, as given by 20 different customers. The scores have been mapped to a scale of 0 to 100 for analysis purposes.
Restaurant Score | Food Quality | Service |
94.5 | 90.9 | 97.8 |
93.2 | 84.1 | 96.8 |
92.8 | 99.9 | 88.7 |
91.1 | 95 | 96.9 |
90.6 | 87.8 | 91.3 |
90.1 | 82.2 | 98.7 |
90.4 | 86.3 | 91.9 |
89.7 | 92.5 | 89.1 |
89.5 | 85.7 | 90.9 |
89.2 | 83.1 | 90.6 |
89.3 | 81.9 | 88.6 |
89 | 93.3 | 89.5 |
88.8 | 78.4 | 91.3 |
87.2 | 91.9 | 73.4 |
87.4 | 75 | 89.8 |
86.8 | 78.2 | 91.7 |
86.2 | 77.5 | 91.1 |
86.1 | 76.7 | 91.5 |
85.9 | 72.2 | 89.5 |
85.1 | 77.5 | 92 |
We need to study the 3 scores for the restaurants in order to recommend to users effectively and in a more objective manner which restaurant is better in terms of overall satisfaction, food quality and service standards. This will help customers make informed choice for dine-out or delivery.
The scatter plot of overall Restaurant Score and Food Quality looks as follows:
And, the scatter plot of overall Restaurant Score and Food Quality looks as follows:
Now, modeling the Restaurant Score using Food Quality score in Excel (go to Data tab -> Data Analysis -> Regression and choose Restaurant Score as Y-column and Service as X-column), we get the following results:
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.67523679 | |||||
R Square | 0.45594473 | |||||
Adjusted R Square | 0.42571943 | |||||
Standard Error | 1.94117804 | |||||
Observations | 20 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 1 | 56.84240103 | 56.842401 | 15.0848737 | 0.001087691 | |
Residual | 18 | 67.82709897 | 3.76817216 | |||
Total | 19 | 124.6695 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 69.8613634 | 4.98392443 | 14.01734 | 3.9829E-11 | 59.39052672 | 80.33220009 |
Food Quality | 0.22819521 | 0.058753764 | 3.88392503 | 0.00108769 | 0.104758137 | 0.351632292 |
Hence, the model obtained is: Restaurant Score = 69.86 + 0.228 * Food Quality ------ (i)
The low p-value of Food Quality coefficient (<< 0.05) shows it is a significant predictor of Restaurant Score at a 1% significance level.
Similarly, modeling Restaurant Score using Service score, we get the following results:
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.4147832 | |||||
R Square | 0.1720451 | |||||
Adjusted R Square | 0.12604761 | |||||
Standard Error | 2.3946784 | |||||
Observations | 20 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 1 | 21.44877664 | 21.4487766 | 3.74031461 | 0.068995054 | |
Residual | 18 | 103.2207234 | 5.73448463 | |||
Total | 19 | 124.6695 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 70.4200815 | 9.696813355 | 7.262188 | 9.4491E-07 | 50.04783265 | 90.79233045 |
Service | 0.20564404 | 0.106331532 | 1.9339893 | 0.06899505 | -0.017750214 | 0.429038303 |
Now, the model obtained is: Restaurant Score = 70.42 + 0.206 * Service ------- (ii)
The p-value of Service coefficient is 0.0689 >> 0.05, hence it is not a significant predictor of Restaurant Score at the 5% significance level.
Making some predictions:
Using model 1, for a Food Quality score of 85, we get
Restaurant Score = 69.86 + 0.228 * 85 = 89.24
Using model 2, for a Service score of 90, we get
Restaurant Score = 70.42 + 0.206 * 90 = 88.96