In: Statistics and Probability
(Side note: I'm not sure if this counts as one question or not but all questions are based on the original case and numbers and felt as splitting the questions would be counter-productive)
Case: A small convenience store chain is interested in modeling the weekly sales of a store, y, as a function of the weekly traffic flow on the street where the store is located. The table below contains data collected from 20 stores in the chain.
Store |
Traffic Flow (thousands of cars) |
Weekly Sales ($ thousands) |
1 |
59.3 |
6.3 |
2 |
60.3 |
6.6 |
3 |
82.1 |
7.6 |
4 |
32.3 |
3.0 |
5 |
98 |
9.5 |
6 |
52.1 |
5.9 |
7 |
54.4 |
6.1 |
8 |
51.3 |
5.0 |
9 |
36.7 |
3.6 |
10 |
23.6 |
2.8 |
11 |
57.6 |
6.7 |
12 |
40.6 |
5.2 |
13 |
75.8 |
8.2 |
14 |
48.3 |
5 |
15 |
41.4 |
3.9 |
16 |
62.5 |
5.4 |
17 |
44.0 |
4.1 |
18 |
29.6 |
3.1 |
19 |
49.5 |
5.4 |
20 |
73.1 |
8.4 |
1. [3 marks] Create a scatter plot of weekly sales (y) vs. traffic flow (x) using MS Excel. Copy and paste (or save and import) your plot to Word. Provide horizontal and vertical axis labels, including appropriate units, and give your plot a title. Based on your plot, do you think there is a linear relationship between the two variables? Why or why not? Answer in a single sentence.
2. [7 marks]
a) [1 mark] State the equation of the fitted regression line between weekly sales and traffic flow. No need to show your work calculating the coefficients – you may use Excel for this; however, make sure you use Word’s equation editor to type the equation properly. Please round the coefficients to 2 decimal places.
b) [1 mark] In one sentence, interpret the value of the fitted y-intercept in question 2.
c) [1 mark] In one sentence, interpret the value of the fitted slope in question 2.
d) [1 mark] What is the coefficient of determination for the
fitted model? Again, you may use Excel for this, no need to show
your work. Please round to 3 decimal places.
e) [1 mark] In one sentence, interpret the value of the coefficient
of determination.
f) [2 marks] The chain wants to establish a new store at a location where the weekly traffic flow is 64,500 cars. Use the fitted equation to predict the weekly sales of the planned store. Show your work and explain what the result means in a single sentence.
3. [6 marks] Use a hypothesis test on the population slope to
determine if there is a significant linear relationship between
weekly sales and traffic flow. Provide all five steps as shown in
class.
1) The null and alternative hypotheses:
2) The test statistic (yes, you may copy/paste from Excel, but
please round to 3 decimal places):
3) The critical value(s) is/are:
4) Do you reject the null hypotheses? (Answer Yes or No):
_________
5) Interpret.
4. [5 marks] Use an ANOVA F-test to determine if there is a significant linear relationship between weekly sales and traffic flow. No need to show calculations for the test statistic here; provide only the final answer. Provide all five steps as shown in class.
1) The null and alternative hypotheses:
2) The test statistic (yes, you may copy/paste from Excel, but
please round to 3 decimal places):
3) The p-value (yes, you may copy/paste from Excel, but please
provide 4 significant digits):
4) Do you reject the null hypotheses? (Answer Yes or No):
_______.
5) Interpret.
5. [4 marks] Create a residual plot. Copy and paste (or save and import) your plot to Word. You may use the default axis titles and overall title provided by Excel. In one sentence, comment on whether the assumption of constant variance is satisfied. In one sentence, comment on whether the assumption of zero mean is satisfied.
SOLUTION 1:
Based on your plot, I do think there is a linear relationship between the two variables BECAUSE the points are showing a pattern which is here upward in direction so yes there is a linear relationship between the two variables.
SOLUTION 2:
a)
where x is Traffic Flow (thousands of cars) and Y is sales in thousands dollars
b) Intercept: If we put value of x=0 then which is meaningless.
c) Slope: For a thousand of cars flow in traffic then there will be an increase of 0.1 thousand dollars increase in sale.
d) Coefficient of determination is also known as R square so it is 0.913.
e) Interpretation: 0.913 indicates that the model explains 91.3% of the variability of the response data around its mean.
f) x=64.5 thousand cars
thousand dollars.
3) NULL HYPOTHESIS H0:
ALTERNATIVE HYPOTHESIS Ha:
test statistic= 13.734
P value=0.00000
P VALUE SMALLER THAN 0.05
REJECT NULL HYPOTHESIS
CONCLUSION: WE HAVE SUFFICIENT EVIDENCE TO SHOW THAT THERE IS a significant linear relationship between weekly sales and traffic flow
4) NULL HYPOTHESIS H0: MODEL IS NOT SIGNIFICANT
ALTERNATIVE HYPOTHESIS Ha: MODEL IS SIGNIFICANT.
TEST STATISTIC F= 188.644
P VALUE= 0.00000
P VALUE SMALLER THAN 0.05
REJECT H0.
WE HAVE SUFFICIENT EVIDENCE TO CONCLUDE THAT THE MODEL IS SIGNIFICANT.
The points on the plot above appear to be randomly scattered around zero, so we can assume that the error term have a zero mean and the vertical width of the scatter doesn't appear to increase or decrease across the fitted values, so we can assume that the variance in the error terms are equal.