In: Statistics and Probability
Sales (Y) | Calls (X1) | Time (X2) | Years (X3) | Type |
48 | 168 | 12.3 | 5 | ONLINE |
36 | 131 | 16.4 | 4 | NONE |
46 | 162 | 15.7 | 3 | NONE |
47 | 183 | 13.0 | 3 | ONLINE |
44 | 177 | 15.3 | 3 | ONLINE |
49 | 181 | 12.4 | 2 | ONLINE |
35 | 123 | 19.0 | 3 | NONE |
46 | 169 | 14.8 | 3 | GROUP |
44 | 158 | 13.9 | 1 | GROUP |
39 | 146 | 15.4 | 3 | GROUP |
48 | 178 | 12.6 | 4 | ONLINE |
42 | 142 | 17.0 | 0 | ONLINE |
45 | 137 | 13.0 | 2 | ONLINE |
54 | 195 | 15.2 | 2 | ONLINE |
43 | 146 | 16.4 | 0 | ONLINE |
44 | 165 | 17.4 | 3 | ONLINE |
34 | 121 | 13.2 | 2 | NONE |
44 | 146 | 16.5 | 1 | NONE |
40 | 132 | 18.2 | 1 | NONE |
51 | 182 | 17.9 | 2 | ONLINE |
41 | 151 | 18.0 | 1 | NONE |
45 | 146 | 15.6 | 3 | ONLINE |
52 | 190 | 13.2 | 3 | ONLINE |
39 | 150 | 19.4 | 0 | GROUP |
41 | 149 | 13.2 | 3 | GROUP |
45 | 167 | 14.5 | 4 | GROUP |
46 | 189 | 20.0 | 1 | GROUP |
47 | 162 | 16.4 | 3 | ONLINE |
42 | 147 | 13.2 | 3 | GROUP |
45 | 171 | 19.4 | 2 | ONLINE |
44 | 165 | 15.0 | 0 | ONLINE |
50 | 175 | 15.1 | 3 | ONLINE |
46 | 161 | 13.2 | 3 | GROUP |
53 | 188 | 11.0 | 2 | ONLINE |
39 | 136 | 17.3 | 0 | NONE |
39 | 135 | 17.7 | 1 | ONLINE |
48 | 168 | 15.9 | 5 | ONLINE |
46 | 167 | 10.1 | 0 | ONLINE |
43 | 150 | 17.4 | 3 | GROUP |
44 | 151 | 15.2 | 2 | GROUP |
42 | 141 | 12.2 | 3 | NONE |
39 | 131 | 19.4 | 2 | NONE |
49 | 174 | 18.3 | 0 | ONLINE |
41 | 154 | 14.5 | 4 | NONE |
42 | 131 | 20.2 | 3 | GROUP |
39 | 128 | 15.3 | 1 | GROUP |
37 | 126 | 13.4 | 4 | NONE |
46 | 180 | 15.1 | 4 | NONE |
45 | 166 | 19.5 | 5 | NONE |
44 | 152 | 16.0 | 2 | ONLINE |
50 | 179 | 12.8 | 3 | ONLINE |
39 | 140 | 18.2 | 1 | NONE |
43 | 154 | 15.3 | 1 | ONLINE |
45 | 164 | 17.2 | 3 | ONLINE |
42 | 139 | 18.6 | 2 | NONE |
44 | 165 | 19.2 | 2 | NONE |
45 | 172 | 12.6 | 3 | GROUP |
41 | 147 | 18.5 | 3 | GROUP |
43 | 152 | 17.2 | 1 | GROUP |
48 | 160 | 15.8 | 2 | ONLINE |
42 | 159 | 13.6 | 4 | GROUP |
46 | 186 | 14.1 | 3 | GROUP |
46 | 150 | 20.7 | 2 | GROUP |
43 | 155 | 11.2 | 3 | ONLINE |
45 | 157 | 16.3 | 4 | ONLINE |
48 | 170 | 12.1 | 1 | ONLINE |
45 | 175 | 18.3 | 2 | GROUP |
49 | 186 | 17.5 | 1 | GROUP |
51 | 181 | 11.4 | 4 | GROUP |
47 | 171 | 17.3 | 2 | ONLINE |
50 | 185 | 16.4 | 0 | ONLINE |
39 | 146 | 15.8 | 1 | GROUP |
42 | 156 | 18.6 | 2 | GROUP |
46 | 157 | 19.3 | 2 | ONLINE |
43 | 163 | 11.7 | 1 | GROUP |
54 | 175 | 14.2 | 1 | ONLINE |
51 | 175 | 12.0 | 2 | ONLINE |
50 | 173 | 13.3 | 1 | ONLINE |
41 | 140 | 14.9 | 3 | NONE |
43 | 156 | 20.5 | 2 | ONLINE |
40 | 146 | 18.2 | 2 | NONE |
42 | 148 | 10.5 | 2 | GROUP |
50 | 183 | 11.7 | 1 | GROUP |
49 | 191 | 13.1 | 2 | GROUP |
40 | 149 | 14.2 | 4 | ONLINE |
40 | 143 | 18.3 | 2 | NONE |
47 | 185 | 15.2 | 2 | ONLINE |
41 | 136 | 17.4 | 3 | GROUP |
51 | 198 | 13.0 | 1 | ONLINE |
43 | 153 | 13.2 | 3 | GROUP |
38 | 129 | 15.2 | 3 | NONE |
44 | 158 | 11.8 | 3 | ONLINE |
43 | 149 | 12.7 | 1 | GROUP |
47 | 175 | 13.9 | 2 | GROUP |
40 | 154 | 16.4 | 3 | GROUP |
43 | 151 | 14.3 | 1 | GROUP |
46 | 153 | 22.0 | 0 | ONLINE |
46 | 167 | 14.8 | 1 | ONLINE |
46 | 167 | 15.8 | 0 | ONLINE |
39 | 143 | 17.7 | 3 |
NONE |
Part C: Regression and Correlation Analysis
Use the dependent variable (labeled Y) and the independent variables (labeled X1, X2, and X3) in the data file. Use Excel to perform the regression and correlation analysis to answer the following.
Generate a scatterplot for the specified dependent variable (Y) and the X1 independent variable, including the graph of the "best fit" line. Interpret.
Determine the equation of the "best fit" line, which describes the relationship between the dependent variable and the selected independent variable.
Determine the coefficient of correlation. Interpret.
Determine the coefficient of determination. Interpret.
Test the utility of this regression model. Interpret results, including the p-value.
Based on the findings in Steps 1-5, analyze the ability of the independent variable to predict the designated dependent variable.
Compute the confidence interval for β1 (the population slope) using a 95% confidence level. Interpret this interval.
Using an interval, estimate the average for the dependent variable for a selected value of the independent variable. Interpret this interval.
Using an interval, predict the particular value of the dependent variable for a selected value of the independent variable. Interpret this interval.
What can be said about the value of the dependent variable for values of the independent variable that are outside the range of the sample values? Explain.
In an attempt to improve the model, use a multiple regression model to predict the dependent variable .Y, based on all of the independent variables. X1, X2, and X3.
Using Excel, run the multiple regression analysis using the designated dependent and three independent variables. State the equation for this multiple regression model.
Perform the Global Test for Utility (F-Test). Explain the conclusion.
Perform the t-test on each independent variable. Explain the conclusions and clearly state how the analysis should proceed. In particular, which independent variables should be kept and which should be discarded. If any independent variables are to be discarded, re-run the multiple regression, including only the significant independent variables, and summarize results with discussion of analysis.
Is this multiple regression model better than the linear model generated in parts 1-10? Explain. Please use the actual data from below in the analysis.
a)
equation of best fit line => y = 0.2018X1+12.243
Coefficient of correlation = 0.871
Coefficient of Determination = (Coefficient of correlation)2 = 0.8712 = 0.759
ANOVA | |||||
df | SS | MS | F | Significance F | |
Regression | 1 | 1307.747 | 1307.747 | 309.046 | 4.58E-32 |
Residual | 98 | 414.6929 | 4.231561 | ||
Total | 99 | 1722.44 |
Since p-value is 0 which is less than 0.05, we conclude that there exist a linear relationship between Y and X1.
Confidence interval for β1 = (0.179 , 0.224)
Multiple Regression test:-
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.874471 | |||||
R Square | 0.7647 | |||||
Adjusted R Square | 0.757347 | |||||
Standard Error | 2.054696 | |||||
Observations | 100 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 3 | 1317.15 | 439.0499 | 103.9965 | 4.76E-30 | |
Residual | 96 | 405.2904 | 4.221775 | |||
Total | 99 | 1722.44 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 14.75674 | 2.664714 | 5.537831 | 2.67E-07 | 9.467321 | 20.04615 |
Calls (X1) | 0.197783 | 0.011964 | 16.53132 | 7.75E-30 | 0.174034 | 0.221531 |
Time (X2) | -0.0938 | 0.082642 | -1.13501 | 0.259195 | -0.25784 | 0.070243 |
Years (X3) | -0.19451 | 0.167938 | -1.15821 | 0.24965 | -0.52786 | 0.138846 |
Multiple linear regression model is better because it explains 76.4 % variability between the data and the Simple linear regression only explains 75.9% of the variability between the data.