In: Statistics and Probability
Brand   Tar   Nicotine   CO
American_Filter   16   1.2   15
Benson_&_Hedges   16   1.2  
15
Camel   16   1   17
Capri   9   0.8   6
Carlton   1   0.1   1
Cartier_Vendome   8   0.8   8
Chelsea   10   0.8   10
GPC_Approved   16   1   17
Hi-Lite   14   1   13
Kent   13   1   13
Lucky_Strike   13   1.1   13
Malibu   15   1.2   15
Marlboro   16   1.2   15
Merit   9   0.7   11
Newport_Stripe   11   0.9   15
Now   2   0.2   3
Old_Gold   18   1.4   18
Pall_Mall   15   1.2   15
Players   13   1.1   12
Raleigh   15   1   16
Richland   17   1.3   16
Rite   9   0.8   10
Silva_Thins   12   1   10
Tareyton   14   1   17
Triumph   5   0.5   7
True   6   0.6   7
Vantage   8   0.7   11
Viceroy   18   1.4   15
Winston   16   1.1   18
a) Find the regression equation that expresses the response variable (y) of nicotine amount in terms of the predictor variable (x) of the tar amount.
b) Find the regression equation that expresses the response variable (y) of nicotine amount in terms of the predictor variable (x) of the carbon monoxide amount.
c) Find the regression equation that expresses the response variable (y) of nicotine amount in terms of predictor variables (x) of tar amount and carbon monoxide amount.
d) For the regression equations found in parts (a), (b), and (c), which is the best equation for predicting the nicotine amount? Justify your answer.
e) Is the best regression equation identified in part (d) a good equation for predicting the nicotine amount? Why or why not?
a) Here the response variable is nicotine amount (y) and predictor is tar amount (x). The regression equation is given by
y = 0.154030 + 0.065052*x
b) Here the response variable is nicotine amount (y) and predictor is carbon monoxide amount (x). The regression equation is given by
y= 0.191639 + 0.060564*x
c) Here the response variable is nicotine amount (y) and predictors are tar amount (x1) and carbon monoxide amount (x2). The regression equation is given by
y= 0.181645 + 0.081837*x1 - 0.018642*x2
d) The best equation for predicting niocotine amount is the equation in part(c) as the value of multiple correlation coefficient is 0.9333 which is highest among the 3 regressions.
e) The best regression equation identified in part (d) may not always be a good choice for predicting nicotine amount since the predictors ie. tar amount and carbon monoxide amount are highly correlated ie. multicollinearity is present in the data set.
The R code is attached.
