In: Statistics and Probability
The owner of Haverty’s Furniture Company was studying the relationship between sales and the amount spent on advertising. The sales information for the last four months is repeated below:
Month |
Advertising Expense ($million) |
Sales Revenue ($ million) |
July |
2 |
7 |
August |
1 |
3 |
September |
3 |
8 |
October |
4 |
10 |
Draw a scatter diagram and discuss whether or not it indicates that there is a relationship between the advertising expense and sales revenue.
Compute the coefficient of correlation. Interpret it.
Compute the coefficient of determination. Interpret it.
Test to determine whether the coefficient of correlation in the population is significantly different from zero. Use alpha 0.05.
Develop the estimated regression equation/the least squares line that could be used to predict the sales revenue based on the advertising expense. Interpret the regression coefficients.
Compute the standard error of estimate.
Test to determine whether the regression slope coefficient in the population is significantly different zero. Use alpha 0.05.
Are the results from part d and g consistent? Should they be? Justify your answer.
Estimate sales when $3 million is spent on advertising.
Draw a scatter diagram and discuss whether or not it indicates that there is a relationship between the advertising expense and sales revenue.
I have used EXCEl>INSERT>SCATTER PLOT and using the quick layout I have fitted a straight line on the points. We get scatter plot as follows;
The straight on the scatter plot shows that there is a linear relationship between the advertising expense and sales revenue.
Using EXCEL> DATA> DATA Analysis > Regression
Input Y; Select column containing the sales revenue in millions
Input X; Select column containing the advertising expenses in millions
ok
The result of regression from excel gives us following
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.964763821 | |||||
R Square | 0.930769231 | |||||
Adjusted R Square | 0.896153846 | |||||
Standard Error | 0.948683298 | |||||
Observations | 4 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 1 | 24.2 | 24.2 | 26.88888889 | 0.035236179 | |
Residual | 2 | 1.8 | 0.9 | |||
Total | 3 | 26 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 1.5 | 1.161895004 | 1.290994449 | 0.325800138 | -3.49923071 | 6.49923071 |
Advertising Expense | 2.2 | 0.424264069 | 5.185449729 | 0.035236179 | 0.374539047 | 4.025460953 |
From the result summary above we can answer the following questions:
Compute the coefficient of correlation. Interpret it.
The coefficient of correlation is the square root of the R-square value in the summary table above. r = sqrt(R^2) = sqrt(0.930769) = 0.96476. The coefficient of correlation r of 0.96476 implies a strong, positive linear relationship between the advertising expense and sales revenue.
Compute the coefficient of determination. Interpret it.
The coefficient of determination, R-square is estimated in the summary table above as 0.93076 which means that 93.076% of the variation in sales revenue is explained by the advertising expenses.
Test to determine whether the coefficient of correlation in the population is significantly different from zero. Use alpha 0.05.
Null Hypothesis: The population correlation coefficient rho is equal to 0
Alternate hypothesis: The population correlation coefficient rho is positive
Let us find the critical value of t value at alpha = 0.05 and df = n - 2 = 4-2 = 2
From the t distribution table, we get right-tailed t value = 2.919986
Decision Rule: Reject H0, if t statistic value is more than critical t value.
The respective one-tailed p-value for df of 2 and t statistic of 5.185449 is 0.017618. This is calculated using EXCEL function =T.DIST.RT(5.185449,2)
Conclusion:
Reject the null hypothesis as t statistic (5.185449) is more than critical t value (2.919986). Also, the p-value (0.017618) is less than the level of significance of 0.05. There is sufficient evidence to prove that that rho > 0
Develop the estimated regression equation/the least squares line that could be used to predict the sales revenue based on the advertising expense. Interpret the regression coefficients.
The linear regression model obtained is
The coefficient of "adv" or slope of the model has p-value of 0.035 which is less than 0.05. Thus we can say the variable Advertisement expenses is statistically significant at the 5% level (p<0.05). The coefficient of Adv is 2.2. Implying for one unit increase in Advertisement expenses, sales revenue will decrease by 2.2.
The intercept of the model has a p-value of 0.325 which is more than 0.05. Thus intercept is not significant. Implying we can't estimate the model when advertisement expenses are zero. The intercept is 1.5.
Compute the standard error of the estimate.
The standard error of the estimate is a measure of the accuracy of predictions made with a regression line. The standard error of the estimate is computed in the summary table above as 0.948683. The standard error of the least square model is given by following formula
where y_i is the observed sales revenue in millions
is the predicted sales revenue in millions using the least square model.
For every sample, we calculated the value of the difference observed - predicted which is also known as residual.
Observed Sales Revenue | Predicted Sales Revenue | Residuals | Residual^2 |
7 | 5.9 | 1.1 | 1.21 |
3 | 3.7 | -0.7 | 0.49 |
8 | 8.1 | -0.1 | 0.01 |
10 | 10.3 | -0.3 | 0.09 |
SUM | -3E-15 | 1.8 |
Using the sum of the residual^2 from the table we get the standard error as
Test to determine whether the regression slope coefficient in the population is significantly different zero. Use alpha 0.05.
The null hypothesis states that the slope is equal to zero, and the alternative hypothesis states that the slope is not equal to zero.
Degree of freedom = N-2 = 4-2 = 2
From the t distribution table, we get two-tailed t value = 4.302653
Decision Rule: Reject H0, if t statistic value is more than critical t value.
The t statistic is the coefficient divided by its standard error. The standard error is an estimate of the standard deviation of the coefficient, the amount it varies across cases. The formula for calculating t statistic is as follows:
The respective two-tailed p-value for df of 2 and t statistic of 5.185449 is 0.0352. This is calculated using EXCEL function =T.DIST.2T(5.185449,2)
Conclusion: Reject the null hypothesis as t statistic (5.185449) is more than critical t value (4.302653). Also, the p-value (0.0325) is less than the level of significance of 0.05. There is sufficient evidence to prove that that slope is significantly different from zero.
Are the results from part d and g consistent? Should they be? Justify your answer.
Yes, they are consistent. As they should be as we have one predictor.
Estimate sales when $3 million is spent on advertising.
When the advertising expense is $3 million the estimated sales revenue is $8.1 million.
Please rate the solution if it helps you. Thank you.