In: Statistics and Probability
In order to predict the emission output of a vehicle, several factors are being investigated. These factors include the weight of the vehicle ('000s), the number of passengers the vehicle can hold, and the vehicle horsepower. The data is provided below. The column for the vehicle is just a label and is not part of the analysis. You should be able to copy and paste this into an Excel file. Dependent Variable: Emissions. Independent Variables: Weight ('000s lbs) Passengers Horsepower
Vehicle | Emissions | Weight ('000s lbs) | Passengers | Horsepower |
A | 11.1 | 4.5 | 5 | 192 |
B | 8.5 | 4.365 | 8 | 230 |
C | 10.6 | 2.98 | 5 | 220 |
D | 11.4 | 5.335 | 7 | 210 |
E | 7.7 | 3.65 | 5 | 215 |
F | 9.1 | 4.125 | 5 | 150 |
G | 11.8 | 4.66 | 5 | 291 |
H | 6.4 | 4.2 | 4 | 268 |
I | 6.6 | 3.475 | 5 | 160 |
J | 6.7 | 2.98 | 5 | 138 |
K | 10.6 | 4.345 | 7 | 236 |
L | 11.4 | 5.095 | 6 | 245 |
M | 6.3 | 2.9 | 4 | 115 |
N | 5.2 | 2.595 | 5 | 126 |
O | 12.5 | 5.435 | 8 | 275 |
P | 6.6 | 3.43 | 5 | 180 |
Q | 9.3 | 4.18 | 6 | 224 |
R | 11.9 | 5.21 | 5 | 240 |
S | 9.7 | 4.555 | 7 | 265 |
T | 5.9 | 2.485 | 5 | 108 |
U | 8.7 | 4.205 | 5 | 290 |
V | 11.5 | 5.28 | 8 | 273 |
W | 7.7 | 3.525 | 5 | 225 |
X | 11 | 5.505 | 9 | 295 |
Y | 12 | 5.9 | 8 | 300 |
Z | 7.6 | 3.555 | 5 | 160 |
AA | 11.7 | 3.88 | 4 | 147 |
Use this data to answer the questions on Regression Analysis.
a) Is the multiple regression model using all three independent variables a better fitting model than your model with two independent variables. Be sure to use quantitative values to explain why there is or is not a better fitting three-variable model.
b) Does the data show any indication of multicollinearity among the three independent variables? Explain how you can determine this using quantitative values. If present, state which variables indicate multicollinearity.
Answers:
Here the,
Dependent Variable: Emissions. Independent Variables: Weight ('000s lbs) Passengers Horsepower.
**************************************************************************
Ans a)
Yes, the multiple regression model using all three independent variables a better fitting model than your model with two independent variables.
Excel output with two independent variables:
Regression Statistics | |
Multiple R | 0.647661 |
R Square | 0.419465 |
Adjusted R Square | 0.371087 |
Standard Error | 1.811786 |
Observations | 27 |
Excel output with three independent variables:
SUMMARY OUTPUT | |
Regression Statistics | |
Multiple R | 0.821701 |
R Square | 0.675192 |
Adjusted R Square | 0.632826 |
Standard Error | 1.384355 |
Observations | 27 |
In the above summary output, the percentage of variation i.e. r-square value of first summary output is 0.419 which is very less compared to the value of r-square of second summary table 0.632
The r-squared value is very important in regression analysis which shows the data fit well or not.
************************************************************************************
Ans b)
Process for computing multicollinearity:
The Variance Inflation Factor (VIF) is 1/Tolerance, it is always greater than or equal to 1. There is no formal VIF value for determining the presence of multicollinearity. Values of VIF that exceed 10 are often regarded as indicating multicollinearity, but in weaker models values above 2.5 may be a cause for concern.
VIF for all three independent variables: Multicollinearity is not present in all three independent variables.
*******************************************************************************************************************