In: Statistics and Probability
Discuss the application of multiple regression model using a real-life example. [Hint: You are supposed to examine a possible relationship between a dependent and at least three important independent variables in the example which you have chosen. Identify dependent variable and independent variable and write the mathematical regression equation for the example and explain each component of the equation.]
Let us consider a popular dataset mtcars for Motor Trend Car in R as follows:
A data frame with 32 observations on 4 (numeric) variables.
[, 1] mpg Miles/(US) gallon
[, 2] disp Displacement (cu.in.)
[, 3] hp Gross horsepower
[, 4] wt Weight (1000 lbs)
Dataset:
mpg | disp | hp | wt | |
Mazda RX4 |
21 |
160 |
110 |
2.62 |
Mazda RX4 Wag |
21 |
160 |
110 |
2.875 |
Datsun 710 |
22.8 |
108 |
93 |
2.32 |
Hornet 4 Drive |
21.4 |
258 |
110 |
3.215 |
Hornet Sportabout |
18.7 |
360 |
175 |
3.44 |
Valiant |
18.1 |
225 |
105 |
3.46 |
Duster 360 |
14.3 |
360 |
245 |
3.57 |
Merc 240D |
24.4 |
146.7 |
62 |
3.19 |
Merc 230 |
22.8 |
140.8 |
95 |
3.15 |
Merc 280 |
19.2 |
167.6 |
123 |
3.44 |
Merc 280C |
17.8 |
167.6 |
123 |
3.44 |
Merc 450SE |
16.4 |
275.8 |
180 |
4.07 |
Merc 450SL |
17.3 |
275.8 |
180 |
3.73 |
Merc 450SLC |
15.2 |
275.8 |
180 |
3.78 |
Cadillac Fleetwood |
10.4 |
472 |
205 |
5.25 |
Lincoln Continental |
10.4 |
460 |
215 |
5.424 |
Chrysler Imperial |
14.7 |
440 |
230 |
5.345 |
Fiat 128 |
32.4 |
78.7 |
66 |
2.2 |
Honda Civic |
30.4 |
75.7 |
52 |
1.615 |
Toyota Corolla |
33.9 |
71.1 |
65 |
1.835 |
Toyota Corona |
21.5 |
120.1 |
97 |
2.465 |
Dodge Challenger |
15.5 |
318 |
150 |
3.52 |
AMC Javelin |
15.2 |
304 |
150 |
3.435 |
Camaro Z28 |
13.3 |
350 |
245 |
3.84 |
Pontiac Firebird |
19.2 |
400 |
175 |
3.845 |
Fiat X1-9 |
27.3 |
79 |
66 |
1.935 |
Porsche 914-2 |
26 |
120.3 |
91 |
2.14 |
Lotus Europa |
30.4 |
95.1 |
113 |
1.513 |
Ford Pantera L |
15.8 |
351 |
264 |
3.17 |
Ferrari Dino |
19.7 |
145 |
175 |
2.77 |
Maserati Bora |
15 |
301 |
335 |
3.57 |
Volvo 142E |
21.4 |
121 |
109 |
2.78 |
Then, linear regression :
mpg is dependent variable
and other 3 'disp', 'hp' and 'wt' are independent variables
Now,
> model = lm(mpg~disp+hp+wt, data = dataset)
> summary(model)
Call:
lm(formula = mpg ~ disp + hp + wt, data = dataset)
Residuals:
Min 1Q Median 3Q Max
-3.891 -1.640 -0.172 1.061 5.861
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.105505 2.110815 17.579 < 2e-16 ***
disp -0.000937 0.010350 -0.091 0.92851
hp -0.031157 0.011436 -2.724 0.01097 *
wt -3.800891 1.066191 -3.565 0.00133 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.639 on 28 degrees of freedom
Multiple R-squared: 0.8268, Adjusted R-squared:
0.8083
F-statistic: 44.57 on 3 and 28 DF, p-value: 8.65e-11
So, equation:
mpg = 37.105 - 0.0009 * disp - 0.031 * hp -3.80 * wt
Also, p-value of model = 8.65e-11 < 0.05 , so model is significant
Also, R-squared: 0.8268 or these 3 variables explain 82.68% variability in mpg.
Explanation:
Please rate my answer and comment for doubt