In: Statistics and Probability
You and your friends have decided to take a trip after the semester ends to celebrate getting through your statistics class. Pick a departure point from a location that is close to your hometown (Denver, CO). You will find a list of ten different destinations for your trip below. ). So, you should have data for 12 different destination points. You need to collect some data online to find the distance between cities as well as the cost of airfare. You do not need to factor in driving distance or cost to get to your departure point – only include the flight distance and cost of airfare between your departure point and each destination.
Create a fitted line plot showing the relationship between distance and cost. Based on your plot, how do you think the cost of the trip is associated with the distance of the trip? Why do you think that a perfect linear relationship does not exist? (i.e. why are the points scattered?)
Determine the correlation coefficient, r. Interpret this value in the context of the problem. Based on this value, do you think the linear model is a good fit? Why or why not?
Interpret the slope and y-intercept of the regression line in the context of the problem.
Find residuals
Using your regression equation for the line of best fit, predict the cost of flying to each destination based on the distances from your original data. Then calculate the residuals for each data point. Enter this information in a table like shown below.
Destination | Distance (Miles) | Observed Cost | Estimated Cost | Residuals |
Miami, FL | 2,066 | $755 | $207 | |
San Diego, CA | 1,078 | $480 | $144 | |
Las Vegas, NV | 749 | $157 | $93 | |
New York City, NY | 1,779 | $417 | $232 | |
Honolulu, HI | 3,344 | $1,600 | $714 | |
Seattle, WA | 1,316 | $426 | $162 | |
London, England | 4,683 | $2,225 | $842 | |
Cancun, Mexico | 1,670 | $544 | $368 | |
Toronto, Canada | 1,344 | $1,759 | $646 | |
Sydney, Australia | 8,325 | $1,618 | $1,235 | |
Rome, Italy | 5,567 | $,1203 | $1,313 | |
- | - | - | - | - |
I used R software to solve this problem.
> dist=scan('clipboard')
> dist
[1] 2066 1078 749 1779 3344 1316 4683 1670 1344 8325 5567
> cost=scan('clipboard')
Read 11 items
> cost
[1] 755 480 157 417 1600 426 2225 544 1759 1618 1203
> plot(dist,cost)
> fit=lm(cost~dist)
> abline(fit)
a) fitted line plot:
Cost is positively correlated to distance. Since point does not lie on straight line hence it is not perfect linear relation.
b)
cor(dist,cost)
[1] 0.6254736
Correlation coefficient = r= 0.6254736
Correlation is high degree positive. It means that if we increase
distance then cost also get increase.
I think linear model is good for this data because we see almost all points are lie near straight line.
c)
Interpretation of slope and intercept.
> summary(fit)
Call:
lm(formula = cost ~ dist)
Residuals:
Min 1Q Median 3Q Max
-467.4 -344.3 -248.2 196.7 1026.2
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 487.94103 278.76516 1.750 0.1140
dist 0.18222 0.07577 2.405 0.0396 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 568.3 on 9 degrees of freedom
Multiple R-squared: 0.3912, Adjusted R-squared: 0.3236
F-statistic: 5.784 on 1 and 9 DF, p-value: 0.03958
Interpretation of slope: When distance is increased by 1 mile then cost is increased by $0.18222
Interpretation of intercept : When distance is 0 mile then cost is $487.94103.
d)
Residuals = Observed cost - expected cost
Residuals |
548 |
336 |
64 |
185 |
886 |
264 |
1383 |
176 |
1113 |
383 |
-110 |