In: Statistics and Probability
1. Car Weight and Fuel Consumption data follow.
Weight (lb) 3465 2895 3225 4095 5210 4180 2730 3270
Highway (mpg) 30 34 32 25 29 23 36 30
a. Make a scatter plot from the above data with axis labels and chart title.
b. Add a linear trendline, and determine the formula of the line and the R-squared value (you may
format the trend line to show the equation and R-squared).
c. Provide an interpretation of the chart. Which points are likely outliers, if any?
d. Plot a second chart with a trend line without the outlier(s). Does this improve the R-squared
and your interpretation?
#a)
x=c(3465,2895,3225,4095,5210,4180,2730,3270)
y=c(30,34,32,25,29,23,36,30)
model=lm(y~x);model
summary(model)
#b)
plot(x,y,ylab="mgp",xlab="weight")
abline(42.980,-0.003607)
#c)
xnew=x[-5]
ynew=y[-5]
model1=lm(ynew~xnew);model1
summary(model1)
#d)
plot(xnew,ynew,ylab="mgp",xlab="weight")
abline(58.288,-0.008299)
Above is the complete R code for all the subquestions and below is the output
a)
b)
> summary(model)
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-4.9048 -1.6931 0.0836 1.8118 4.8100
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 42.980798 5.855886 7.340 0.000327 ***
x -0.003607 0.001577 -2.287 0.062186 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.414 on 6 degrees of freedom
Multiple R-squared: 0.4658, Adjusted R-squared: 0.3767
F-statistic: 5.231 on 1 and 6 DF, p-value: 0.06219
c)
One point is outlier specifically the car having weight 5210 is an outlier as it pushes the regression line towards itself and hence most points deviate from the line.
d)
> summary(model1)
Call:
lm(formula = ynew ~ xnew)
Residuals:
1 2 3 4 5 6 7
0.4683 -0.2622 0.4765 0.6967 -0.5978 0.3685 -1.1500
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 58.2880761 1.9037787 30.62 6.97e-07 ***
xnew -0.0082991 0.0005523 -15.03 2.36e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7505 on 5 degrees of freedom
Multiple R-squared: 0.9783, Adjusted R-squared: 0.974
F-statistic: 225.8 on 1 and 5 DF, p-value: 2.364e-05
It can be observed that the R^2 value changes a lot and the regression lines also gives a good fit.
Thank you !!