In: Statistics and Probability
An American consumer organization is currently examining the relationship among several variables including gasoline mileage as measured by miles per gallon; the horsepower of the car’s engine and the weight of the car (in pounds). A sample of 50 recent car models was selected and the results recorded. (provided in auto.xlxs) (a) Using the sample data attached, calculate the sample mean and standard deviation for the variables: - 1. Miles per gallon (MPG) 2. Horsepower 3. Weight (in pounds) (b) Is there any evidence of skewness in the data sets? Which data set displays greatest skewness? (c) Using the sample data on MPG, calculate the sample proportion of vehicles whose fuel economy exceeds 37 mpg and its corresponding standard deviation. (d) Indicate all possible relationship between the variables and comment on it. (You may use scatter diagram or Karl Pearson’s correlation coefficient) (e) Using the complete data set and using simple ordinary least squares regression formulae develop two models to explain the behavior of gasoline mileage (Miles per gallon) as a function of their- (i) Horsepower (ii) Weight f. Which model best describes the behavior of gasoline mileage? Explain your reasons here.
MPG Horsepower Weight
43.1 48 1985
19.9 110 3365
19.2 105 3535
17.7 165 3445
18.1 139 3205
20.3 103 2830
21.5 115 3245
16.9 155 4360
15.5 142 4054
18.5 150 3940
27.2 71 3190
41.5 76 2144
46.6 65 2110
23.7 100 2420
27.2 84 2490
39.1 58 1755
28 88 2605
24 92 2865
20.2 139 3570
20.5 95 3155
28 90 2678
34.7 63 2215
36.1 66 1800
35.7 80 1915
20.2 85 2965
23.9 90 3420
29.9 65 2380
30.4 67 3250
36 74 1980
22.6 110 2800
36.4 67 2950
27.5 95 2560
33.7 75 2210
44.6 67 1850
32.9 100 2615
38 67 1965
24.2 120 2930
38.1 60 1968
39.4 70 2070
25.4 116 2900
31.3 75 2542
34.1 68 1985
34 88 2395
31 82 2720
27.4 80 2670
22.3 88 2890
28 79 2625
17.6 85 3465
34.4 65 3465
20.6 105 3380
I used R software to solve this question.
R codes and output:
d=read.table('mpg.txt',header=TRUE)
> head(d)
MPG Horsepower Weight
1 43.1 48 1985
2 19.9 110 3365
3 19.2 105 3535
4 17.7 165 3445
5 18.1 139 3205
6 20.3 103 2830
> attach(d)
Que.a
> mean(MPG)
[1] 28.542
> sd(MPG)
[1] 8.171431
> mean(Horsepower)
[1] 90.84
> sd(Horsepower)
[1] 27.25867
> mean(Weight)
[1] 2756.52
> sd(Weight)
[1] 635.051
For variable MPG:
Mean = 28.542 and sd = 8.1714
For variable Horsepower :
Mean = 90.84 and sd = 27.2586
For variable weight:
Mean = 2756.52 and sd = 635.051
Que.b
par(mfrow=c(2,2))
> hist(MPG)
> hist(Horsepower)
> hist(Weight)
We check skewness of variable using histogram. Histogram for horsepower is very asymmetric, hence variable horsepower has greatest skewness.
Que.c
mpg=subset(MPG, (MPG>37));mpg
[1] 43.1 41.5 46.6 39.1 44.6 38.0 38.1 39.4
> length(mpg)
[1] 8
> prop=8/50
> prop
[1] 0.16
> sd(mpg)
[1] 3.203569
Sample proportion of vehicle whose fuel economy exceeds 37 mpg is 0.16
And its standard deviation is 3.2036
Que.d
pairs(d)
1. There is negative correlation between the pair MPG and horsepower and also in the pair MPG and weight.
2. There is positive correlation between horsepower and weight.
Que.e
model=lm(MPG~Horsepower)
> summary(model)
Call:
lm(formula = MPG ~ Horsepower)
Residuals:
Min 1Q Median 3Q Max
-12.3218 -3.7569 -0.1532 3.3686 11.9527
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 50.00521 2.52351 19.816 < 2e-16 ***
Horsepower -0.23627 0.02663 -8.873 1.09e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.081 on 48 degrees of freedom
Multiple R-squared: 0.6212, Adjusted R-squared: 0.6133
F-statistic: 78.72 on 1 and 48 DF, p-value: 1.093e-11
> model2=lm(MPG~Weight)
> summary(model2)
Call:
lm(formula = MPG ~ Weight)
Residuals:
Min 1Q Median 3Q Max
-8.414 -2.636 -1.202 2.317 13.377
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 57.79725 2.96897 19.47 < 2e-16 ***
Weight -0.01061 0.00105 -10.11 1.79e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.668 on 48 degrees of freedom
Multiple R-squared: 0.6803, Adjusted R-squared: 0.6736
F-statistic: 102.1 on 1 and 48 DF, p-value: 1.787e-13
Since p-value for F statistics corresponding to both models are less than 0.05, hence both the model are useful in predicting MPG.
However adjusted R2 for second model (for weight as predictor) is greater.
Hence second model best describe behavior of gasoline mileage.