In: Statistics and Probability
City | Cost of Living Index | Rent (in City Centre) | Monthly Pubic Trans Pass | Bottle of Wine (mid-range) | Loaf of Bread | Milk |
London | 88.33 | $4,069.99 | $173.81 | $10.53 | $1.23 | $4.63 |
Dublin | 87.93 | $3,025.83 | $144.78 | $14.12 | $1.37 | $4.31 |
Paris | 89.94 | $2,701.61 | $85.92 | $8.24 | $1.56 | $4.68 |
Rome | 78.19 | $2,354.10 | $41.20 | $7.06 | $1.38 | $6.82 |
Amsterdam | 85.9 | $2,823.28 | $105.93 | $7.06 | $1.33 | $4.34 |
Berlin | 71.65 | $1,695.77 | $95.34 | $5.89 | $1.24 | $3.52 |
Athens | 63.06 | $569.12 | $35.31 | $8.24 | $0.80 | $5.35 |
Brussels | 82.2 | $1,734.75 | $57.68 | $8.24 | $1.66 | $4.17 |
Madrid | 66.75 | $1,795.10 | $64.27 | $5.89 | $1.04 | $3.63 |
Prague | 50.95 | $1,240.48 | $25.01 | $5.46 | $0.92 | $3.14 |
Warsaw | 45.45 | $1,060.06 | $30.09 | $6.84 | $0.69 | $2.68 |
Tokyo | 92.94 | $2,197.03 | $88.77 | $17.75 | $1.77 | $6.46 |
Sydney | 90.78 | $3,777.72 | $124.55 | $14.01 | $1.94 | $4.43 |
New York | 100 | $5,877.45 | $121.00 | $15.00 | $2.93 | $3.98 |
Mumbai | 31.74 | $1,642.68 | $7.66 | $10.73 | $0.41 | $2.93 |
Vancouver | 74.06 | $2,937.27 | $74.28 | $14.38 | $2.28 | $7.12 |
Seoul | 83.45 | $2,370.81 | $50.53 | $17.57 | $2.44 | $7.90 |
You’ll find the 2017 data for 17 cities in the data set Cost of Living. Included are the 2017 cost of living index, cost of a 3-bedroom apartment (per month), price of monthly transportation pass, price of a mid-range bottle of wine, price of a loaf of bread (1 lb.) and the price of a gallon of milk. All prices are in U.S. dollars. Examine the relationship between the overall cost of living and the cost of each of these individual items by using linear regression. Verify the necessary conditions and describe the relationship in as much detail as possible (remember to look at direction, form and strength). Identify any unusual observations (outliers), research and discuss. Based on the linear regressions and your analysis, which item would be the best predictor of the overall cost in these cities? Which would be the worst? Are there any surprising relationships? Describe in detail your analysis, results and conclusions.
Examine the relationship between the overall cost of living and the cost of each of these individual items by using linear regression.
The linear regression between the overall cost of living and Rent (in City Centre) is as follows:
r² | 0.495 | n | 17 | |||
r | 0.703 | k | 1 | |||
Std. Error | 13.684 | Dep. Var. | Cost of Living Index | |||
ANOVA table | ||||||
Source | SS | df | MS | F | p-value | |
Regression | 2,749.1946 | 1 | 2,749.1946 | 14.68 | .0016 | |
Residual | 2,808.6562 | 15 | 187.2437 | |||
Total | 5,557.8509 | 16 | ||||
Regression output | confidence interval | |||||
variables | coefficients | std. error | t (df=15) | p-value | 95% lower | 95% upper |
Intercept | 50.1985 | |||||
Rent (in City Centre) | 0.0103 | 0.0027 | 3.832 | .0016 | 0.0046 | 0.0160 |
The linear regression equation will be:
Cost of Living Index = 50.1985 + 0.0103*Rent (in City Centre)
This means that for every $1 increase in Rent, the Cost of Living Index will be increased by 0.0103.
The linear regression between the overall cost of living and Monthly Public Trans Pass is as follows:
r² | 0.578 | n | 17 | |||
r | 0.760 | k | 1 | |||
Std. Error | 12.511 | Dep. Var. | Cost of Living Index | |||
ANOVA table | ||||||
Source | SS | df | MS | F | p-value | |
Regression | 3,209.9682 | 1 | 3,209.9682 | 20.51 | .0004 | |
Residual | 2,347.8827 | 15 | 156.5255 | |||
Total | 5,557.8509 | 16 | ||||
Regression output | confidence interval | |||||
variables | coefficients | std. error | t (df=15) | p-value | 95% lower | 95% upper |
Intercept | 51.3466 | |||||
Monthly Pubic Trans Pass | 0.3095 | 0.0683 | 4.529 | .0004 | 0.1638 | 0.4552 |
The linear regression equation will be:
Cost of Living Index = 51.3466 + 0.3095*Monthly Public Trans Pass
This means that for every $1 increase in Monthly Public Trans Pass, the Cost of Living Index will be increased by 0.3095.
The linear regression between the overall cost of living and Bottle of Wine (mid-range) is as follows:
r² | 0.231 | n | 17 | |||
r | 0.481 | k | 1 | |||
Std. Error | 16.879 | Dep. Var. | Cost of Living Index | |||
ANOVA table | ||||||
Source | SS | df | MS | F | p-value | |
Regression | 1,284.1123 | 1 | 1,284.1123 | 4.51 | .0508 | |
Residual | 4,273.7386 | 15 | 284.9159 | |||
Total | 5,557.8509 | 16 | ||||
Regression output | confidence interval | |||||
variables | coefficients | std. error | t (df=15) | p-value | 95% lower | 95% upper |
Intercept | 53.3288 | |||||
Bottle of Wine (mid-range) | 2.1283 | 1.0025 | 2.123 | .0508 | -0.0085 | 4.2651 |
The linear regression equation will be:
Cost of Living Index = 53.3288 + 2.1283*Bottle of Wine (mid-range)
This means that for every $1 increase in Bottle of Wine (mid-range), the Cost of Living Index will be increased by 2.1283.
The linear regression between the overall cost of living and the Loaf of Bread is as follows:
r² | 0.568 | n | 17 | |||
r | 0.754 | k | 1 | |||
Std. Error | 12.652 | Dep. Var. | Cost of Living Index | |||
ANOVA table | ||||||
Source | SS | df | MS | F | p-value | |
Regression | 3,156.8757 | 1 | 3,156.8757 | 19.72 | .0005 | |
Residual | 2,400.9752 | 15 | 160.0650 | |||
Total | 5,557.8509 | 16 | ||||
Regression output | confidence interval | |||||
variables | coefficients | std. error | t (df=15) | p-value | 95% lower | 95% upper |
Intercept | 44.0470 | |||||
Loaf of Bread | 21.3894 | 4.8163 | 4.441 | .0005 | 11.1236 | 31.6552 |
The linear regression equation will be:
Cost of Living Index = 44.0470 + 21.3894*Loaf of Bread
This means that for every $1 increase in Loaf of Bread, the Cost of Living Index will be increased by 21.3894.
The linear regression between the overall cost of living and Milk is as follows:
r² | 0.203 | n | 17 | |||
r | 0.450 | k | 1 | |||
Std. Error | 17.185 | Dep. Var. | Cost of Living Index | |||
ANOVA table | ||||||
Source | SS | df | MS | F | p-value | |
Regression | 1,127.9392 | 1 | 1,127.9392 | 3.82 | .0696 | |
Residual | 4,429.9117 | 15 | 295.3274 | |||
Total | 5,557.8509 | 16 | ||||
Regression output | confidence interval | |||||
variables | coefficients | std. error | t (df=15) | p-value | 95% lower | 95% upper |
Intercept | 49.6351 | |||||
Milk | 5.4879 | 2.8081 | 1.954 | .0696 | -0.4975 | 11.4732 |
The linear regression equation will be:
Cost of Living Index = 49.6351 + 5.4879*Milk
This means that for every $1 increase in Milk, the Cost of Living Index will be increased by 5.4879.
Based on the linear regressions and your analysis, which item would be the best predictor of the overall cost in these cities? Which would be the worst? Are there any surprising relationships?
The loaf of bread would be the best predictor of the overall cost in these cities.
Rent (in City Centre) would be the worst predictor of the overall cost in these cities.
The surprising relationship is between the overall cost of living and Monthly Public Trans Pass.