In: Statistics and Probability
A sample of 20 cars, including measurements of fuel consumption (city mi/gal and highway mi/gal), weight (pounds), number of cylinders, engine displacement (in liters), amount of greenhouse gases emitted (in tons/year), and amount of tailpipe emissions of NOx (in lb/yr).
CAR |
CITY |
HWY |
WEIGHT |
CYLINDERS |
DISPLACEMENT |
MAN/AUTO |
GHG |
NOX |
Chev. Camaro |
19 |
30 |
3545 |
6 |
3.8 |
M |
12 |
34.4 |
Chev. Cavalier |
23 |
31 |
2795 |
4 |
2.2 |
A |
10 |
25.1 |
Dodge Neon |
23 |
32 |
2600 |
4 |
2 |
A |
10 |
25.1 |
Ford Taurus |
19 |
27 |
3515 |
6 |
3 |
A |
12 |
25.1 |
Honda Accord |
23 |
30 |
3245 |
4 |
2.3 |
A |
11 |
25.1 |
Lincoln Cont. |
17 |
24 |
3930 |
8 |
4.6 |
A |
14 |
25.1 |
Mercury Mystique |
20 |
29 |
3115 |
6 |
2.5 |
A |
12 |
34.4 |
Mitsubishi Eclipse |
22 |
33 |
3235 |
4 |
2 |
M |
10 |
25.1 |
Olds. Aurora |
17 |
26 |
3995 |
8 |
4 |
A |
13 |
34.4 |
Pontiac Grand Am |
22 |
30 |
3115 |
4 |
2.4 |
A |
11 |
25.1 |
Toyota Camry |
23 |
32 |
3240 |
4 |
2.2 |
M |
10 |
25.1 |
Cadillac DeVille |
17 |
26 |
4020 |
8 |
4.6 |
A |
13 |
34.4 |
Chev. Corvette |
18 |
28 |
3220 |
8 |
5.7 |
M |
12 |
34.4 |
Chrysler Sebring |
19 |
27 |
3175 |
6 |
2.5 |
A |
12 |
25.1 |
Ford Mustang |
20 |
29 |
3450 |
6 |
3.8 |
M |
12 |
34.4 |
BMW 3-Series |
19 |
27 |
3225 |
6 |
2.8 |
A |
12 |
34.4 |
Ford Crown Victoria |
17 |
24 |
3985 |
8 |
4.6 |
A |
14 |
25.1 |
Honda Civic |
32 |
37 |
2440 |
4 |
1.6 |
M |
8 |
25.1 |
Mazda Protege |
29 |
34 |
2500 |
4 |
1.6 |
A |
9 |
25.1 |
Hyundai Accent |
28 |
37 |
2290 |
4 |
1.5 |
A |
9 |
34.4 |
To determine whether there is any linear relationship between the number of cylinders (CYLINDERS) a car has and the greenhouse emission gasses (GHG) , first we make a scatterplot for the data, then we calculate the linear correlation coefficient. If there is strong linear correlation then we do regression.Answer the following questions:
1. Make a scatterplot for CYLINDERS and GHG. Use your independent variable as CYLINDERS and dependent variable as GHG.
i. Describe the type of linear correlation- positive, negative, no correlation. Is it nonlinear?
2. Find the linear correlation coefficient between CLYLINERS and GHG.
i. Describe the linear correlation coefficient. Is it positive or negative? Is it strong, moderate or week?
ii. Use Table A6 and a = 0.05 to determine whether there is correlation between CYLINDER and GHG in the population.
3. Find the regression line between CYLINDERS and GHG.
i. What is the meaning of the slope for your regression equation?
ii. What is the meaning of y-intercept for your regression equation?
iii. Estimate the greenhouse emission gasses amount if the number of cylinders for cars could be 5.
1) SCATTER PLOT:
I have used excel to construct scatter plot.
The scatter plot for Cylinders and GHG, with cylinders as independent variable and greenhouse emission gasses (GHG) as dependent variable is given below:
(i) From the scatter plot, it is evident that there is positive linear pattern. Thus there is positive linear correlation between the variables Cylinders and GHG.
2) LINEAR CORRELATION COEFFICIENT:
I have used R code to find linear correlation coefficient between Cylinders and GHG.
(i) The linear correlation coefficient between Cylinders and GHG is . Since the coefficient is positive and closer to , there is a strong positive linear correlation between Cylinders and GHG.
(ii) TEST FOR SIGNIFICANCE OF CORRELATION:
HYPOTHESIS:
(That is, there is no statistically significant linear correlation between Cylinders and GHG in population)
(That is, there is statistically significant linear correlation between Cylinders and GHG in population)
R OUTPUT:
Since the p value is less than the significance level , we reject the null hypothesis and conclude that there is statistically significant linear correlation between Cylinders and GHG in population.
3) SIMPLE LINEAR REGRESSION OUTPUT:
I have used R code to build simple linear regression model to data with cylinders as independent variable and greenhouse emission gasses (GHG) as dependent variable.
ESTIMATED LINEAR REGRESSION EQUATION:
The estimated simple linear regression equation is,
where
is the predicted dependent variables "greenhouse emission gasses (GHG) "
is the intercept
is the slope coefficient of independent variable "cylinders"
is the independent variable "cylinders"
(i) The slope coefficient of regression equation is . That is, the mean amount of greenhouse gases emitted increases by 0.8788 (in tons/year) for increase in number of cylinders by one.
(ii) The y-intercept for regression equation is . That is, the mean amount of greenhouse gases emitted is 6.3788 (in tons/year) when the number cylinders is 0 (or without involving number of cylinders).
(iii) PREDICTED VALUE OF GHG:
If the number of cylinders for cars is 5, the predicted value of amount of greenhouse gases emitted is,
If the number of cylinders for cars is 5, the predicted value of amount of greenhouse gases emitted is 10.7728 (in tons/year).