In: Math
Using the mtcars dataset, answer the following questions:
Fill in the following table:
Variable |
Correlation with mpg |
cyl |
-0.85216 |
disp |
-0.84755 |
hp |
-0.77617 |
drat |
0.681172 |
wt |
-0.86766 |
qsec |
0.418684 |
vs |
0.664039 |
am |
0.599832 |
gear |
0.480285 |
carb |
-0.55093 |
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | ||
Mazda RX4 | 21 | 6 | 160 | 110 | 3.9 | 2.62 | 16.46 | 0 | 1 | 4 | 4 | |
Mazda RX4 Wag | 21 | 6 | 160 | 110 | 3.9 | 2.875 | 17.02 | 0 | 1 | 4 | 4 | |
Datsun 710 | 22.8 | 4 | 108 | 93 | 3.85 | 2.32 | 18.61 | 1 | 1 | 4 | 1 | |
Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 | |
Hornet Sportabout | 18.7 | 8 | 360 | 175 | 3.15 | 3.44 | 17.02 | 0 | 0 | 3 | 2 | |
Valiant | 18.1 | 6 | 225 | 105 | 2.76 | 3.46 | 20.22 | 1 | 0 | 3 | 1 | |
Duster 360 | 14.3 | 8 | 360 | 245 | 3.21 | 3.57 | 15.84 | 0 | 0 | 3 | 4 | |
Merc 240D | 24.4 | 4 | 146.7 | 62 | 3.69 | 3.19 | 20 | 1 | 0 | 4 | 2 | |
Merc 230 | 22.8 | 4 | 140.8 | 95 | 3.92 | 3.15 | 22.9 | 1 | 0 | 4 | 2 | |
Merc 280 | 19.2 | 6 | 167.6 | 123 | 3.92 | 3.44 | 18.3 | 1 | 0 | 4 | 4 | |
Merc 280C | 17.8 | 6 | 167.6 | 123 | 3.92 | 3.44 | 18.9 | 1 | 0 | 4 | 4 | |
Merc 450SE | 16.4 | 8 | 275.8 | 180 | 3.07 | 4.07 | 17.4 | 0 | 0 | 3 | 3 | |
Merc 450SL | 17.3 | 8 | 275.8 | 180 | 3.07 | 3.73 | 17.6 | 0 | 0 | 3 | 3 | |
Merc 450SLC | 15.2 | 8 | 275.8 | 180 | 3.07 | 3.78 | 18 | 0 | 0 | 3 | 3 | |
Cadillac Fleetwood | 10.4 | 8 | 472 | 205 | 2.93 | 5.25 | 17.98 | 0 | 0 | 3 | 4 | |
Lincoln Continental | 10.4 | 8 | 460 | 215 | 3 | 5.424 | 17.82 | 0 | 0 | 3 | 4 | |
Chrysler Imperial | 14.7 | 8 | 440 | 230 | 3.23 | 5.345 | 17.42 | 0 | 0 | 3 | 4 | |
Fiat 128 | 32.4 | 4 | 78.7 | 66 | 4.08 | 2.2 | 19.47 | 1 | 1 | 4 | 1 | |
Honda Civic | 30.4 | 4 | 75.7 | 52 | 4.93 | 1.615 | 18.52 | 1 | 1 | 4 | 2 | |
Toyota Corolla | 33.9 | 4 | 71.1 | 65 | 4.22 | 1.835 | 19.9 | 1 | 1 | 4 | 1 | |
Toyota Corona | 21.5 | 4 | 120.1 | 97 | 3.7 | 2.465 | 20.01 | 1 | 0 | 3 | 1 | |
Dodge Challenger | 15.5 | 8 | 318 | 150 | 2.76 | 3.52 | 16.87 | 0 | 0 | 3 | 2 | |
AMC Javelin | 15.2 | 8 | 304 | 150 | 3.15 | 3.435 | 17.3 | 0 | 0 | 3 | 2 | |
Camaro Z28 | 13.3 | 8 | 350 | 245 | 3.73 | 3.84 | 15.41 | 0 | 0 | 3 | 4 | |
Pontiac Firebird | 19.2 | 8 | 400 | 175 | 3.08 | 3.845 | 17.05 | 0 | 0 | 3 | 2 | |
Fiat X1-9 | 27.3 | 4 | 79 | 66 | 4.08 | 1.935 | 18.9 | 1 | 1 | 4 | 1 | |
Porsche 914-2 | 26 | 4 | 120.3 | 91 | 4.43 | 2.14 | 16.7 | 0 | 1 | 5 | 2 | |
Lotus Europa | 30.4 | 4 | 95.1 | 113 | 3.77 | 1.513 | 16.9 | 1 | 1 | 5 | 2 | |
Ford Pantera L | 15.8 | 8 | 351 | 264 | 4.22 | 3.17 | 14.5 | 0 | 1 | 5 | 4 | |
Ferrari Dino | 19.7 | 6 | 145 | 175 | 3.62 | 2.77 | 15.5 | 0 | 1 | 5 | 6 | |
Maserati Bora | 15 | 8 | 301 | 335 | 3.54 | 3.57 | 14.6 | 0 | 1 | 5 | 8 | |
Volvo 142E | 21.4 | 4 | 121 | 109 | 4.11 | 2.78 | 18.6 | 1 | 1 | 4 | 2 | |
correlation | -0.85216 | -0.84755 | -0.77617 | 0.681172 | -0.86766 | 0.418684 | 0.664039 | 0.599832 | 0.480285 | -0.55093 |
Which of the variables is the best predictor of mpg? Justify your answer using the correlation coefficient and a scatterplot.
Fit a regression model based on your answer to question 2. Write your model below.
Is the slope significant? Justify your answer.
Interpret the slope of your regression model.
Interpret the r2 value.
Do you believe your model is a good model for predicting mpg? Justify your answer.
Best Predictor
Since mpg has highest correlation with Weight. The correlation is neagative which indiacates that as weight increases Mileage of vehicle decreases which is obvious also.
So, weight(wt) can be considered as best predictior.
Scatter plot
From above plot it is clear that mpg and weight are highly neagtively correlatd and we can fit linear regression model to it.
Model Fitting
We need to fit model between mpg and wt . I am using Excel to fot linear regression model.
Excel Output:
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.867659377 | |||||
R Square | 0.752832794 | |||||
Adjusted R Square | 0.744593887 | |||||
Standard Error | 3.045882125 | |||||
Observations | 32 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 1 | 847.72525 | 847.72525 | 91.375325 | 0.00000000013 | |
Residual | 30 | 278.3219375 | 9.277397918 | |||
Total | 31 | 1126.047188 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 37.28512617 | 1.877627337 | 19.85757526 | 0.0000000000000000008 | 33.45049959 | 41.11975275 |
wt | -5.344471573 | 0.559101045 | -9.559044147 | 0.0000000001293958701 | -6.486308234 | -4.202634912 |
Regression Model:
mpg = 37.285 - 5.3444 * wt
Slope:
Yes the slope is significant at 5 % significance level since p-value < 0.05
Interpreting slope:
The negative value of slope indicates that for each unit increase in weight mileage decreases by 5.344 units.
The decrease in mileage of vehicle with inrcease in its wieght is obvious
Interpreting R2
The R-square has value of 0.752 which indicates that 75% of variance in mpg is captured just by Weight of vehicle.
This also shows that Weight variable is good in determining the Mileage.
If we add more variables which also strong correlation with mpg like disp and cyl our R2 will increase significantly
Model
With R2 of 0.752 our model can be considered as decent. As we can see there are few outliers, so treating them will increase model efficieny little bit.
There are few other variables like disp and cyl which also has very high correlation with Mpg. Adding them while building model will increase model accuracy.
Overall, Model is good