In: Statistics and Probability
Can someone please explain this?
The international candy association wants to predict chocolate bar prices. They have collected the following data for 16 samples.
Sample |
Size |
Price |
Energy |
Protein |
Fat |
Carbo |
Sodium |
1 |
50 |
0.88 |
1970 |
3.1 |
High |
53.2 |
Low |
2 |
50 |
0.88 |
2003 |
4.6 |
High |
59 |
Low |
3 |
40 |
1.15 |
2057 |
9.9 |
Low |
60.9 |
Low |
4 |
80 |
1.54 |
1920 |
5.1 |
Low |
67.5 |
High |
5 |
45 |
1.15 |
2250 |
7.2 |
High |
59.4 |
Low |
6 |
78 |
1.4 |
2186 |
7 |
High |
59.7 |
Low |
7 |
55 |
1.28 |
1930 |
3.5 |
High |
56.4 |
Low |
8 |
60 |
0.97 |
1980 |
10.2 |
Low |
59.9 |
High |
9 |
60 |
0.97 |
1890 |
4.7 |
Low |
67.9 |
High |
10 |
50 |
1.28 |
2030 |
5.6 |
Low |
67.4 |
High |
11 |
40 |
1.1 |
2180 |
5.5 |
High |
67.3 |
High |
12 |
55 |
1.28 |
1623 |
2.2 |
Low |
73.3 |
Low |
13 |
44.5 |
0.97 |
1640 |
3.7 |
Low |
77.9 |
High |
14 |
75 |
1.58 |
2210 |
8.2 |
High |
57 |
Low |
15 |
60 |
1.55 |
1980 |
8.5 |
Low |
63.3 |
Low |
16 |
42.5 |
1.18 |
1970 |
5 |
Low |
69 |
a. Identify the dependent variable and independent variables.
b. Build a multiple linear regression model to predict chocolate bar price.
c. Can this model be used to make predictions? Explain
First, we change the categorical variables Fat and Sodium to dummy variables.
High=1
Low=0
Sample | Price | Size | Energy | Protein | Fat | Carbo | Sodium |
1 | 0.88 | 50 | 1970 | 3.1 | 1 | 53.2 | 0 |
2 | 0.88 | 50 | 2003 | 4.6 | 1 | 59 | 0 |
3 | 1.15 | 40 | 2057 | 9.9 | 0 | 60.9 | 0 |
4 | 1.54 | 80 | 1920 | 5.1 | 0 | 67.5 | 1 |
5 | 1.15 | 45 | 2250 | 7.2 | 1 | 59.4 | 0 |
6 | 1.4 | 78 | 2186 | 7 | 1 | 59.7 | 0 |
7 | 1.28 | 55 | 1930 | 3.5 | 1 | 56.4 | 0 |
8 | 0.97 | 60 | 1980 | 10.2 | 0 | 59.9 | 1 |
9 | 0.97 | 60 | 1890 | 4.7 | 0 | 67.9 | 1 |
10 | 1.28 | 50 | 2030 | 5.6 | 0 | 67.4 | 1 |
11 | 1.1 | 40 | 2180 | 5.5 | 1 | 67.3 | 1 |
12 | 1.28 | 55 | 1623 | 2.2 | 0 | 73.3 | 0 |
13 | 0.97 | 44.5 | 1640 | 3.7 | 0 | 77.9 | 1 |
14 | 1.58 | 75 | 2210 | 8.2 | 1 | 57 | 0 |
15 | 1.55 | 60 | 1980 | 8.5 | 0 | 63.3 | 0 |
16 | 1.18 | 42.5 | 1970 | 5 | 0 | 69 | 0 |
Price is the dependent variable and the rest of them are independent variables. Except for sample which is not a variable.
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.820176391 | |||||
R Square | 0.672689312 | |||||
Adjusted R Square | 0.454482187 | |||||
Standard Error | 0.172742075 | |||||
Observations | 16 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 6 | 0.551941581 | 0.091990263 | 3.082801772 | 0.063213291 | |
Residual | 9 | 0.268558419 | 0.029839824 | |||
Total | 15 | 0.8205 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | -2.089379783 | 1.246714852 | -1.675908312 | 0.128074318 | -4.909644711 | 0.730885144 |
Size | 0.012720923 | 0.00357347 | 3.559823882 | 0.006120968 | 0.004637173 | 0.020804673 |
Energy | 0.000800291 | 0.000526591 | 1.519757621 | 0.162892016 | -0.000390941 | 0.001991523 |
Protein | -0.006628177 | 0.034325074 | -0.193100151 | 0.851167933 | -0.084276888 | 0.071020534 |
Fat | -0.152258497 | 0.184657024 | -0.824547553 | 0.430938381 | -0.569981706 | 0.265464711 |
Carbo | 0.018615509 | 0.011759022 | 1.583083111 | 0.147862317 | -0.007985246 | 0.045216264 |
Sodium | -0.235767157 | 0.108628126 | -2.17040619 | 0.058075052 | -0.481501049 | 0.009966736 |
c)
c)
The p-value of F-statistic =0.063 >0.05.
This means that this model is not significant.
We cannot use this model for prediction of the price as its not a good model.
Moreover, p-values of most of the t-statistics of coefficients is greater than 0.05, which also tells us that they are not significant.