In: Statistics and Probability
Consider one of the subset regression models for each data set obtained in Problem Set 4 and answer the following questions. (i) Draw the scatter plot matrix, residual vs. predictor variable plots and added variable plots. Comment on the regression model based on these plots. (ii) Draw the normal-probability plot and comment. (iii) Draw the correlogram and comment. (iv) Detect leverage points from the data. (v) Compute Cook’s distance statistics and detect all outlier points from the data. (vi) Compute DFFITS statistics and detect all outlier points from the data. (vii) Compute DFBETAS statistics and comment.
Two data sets are given for the following variables. 30 observations on 11 variables – Miles/(US) gallon, Number of cylinders, Displacement (cu.in.), Gross horsepower, Rear axle ratio, Weight (1000 lbs), 1/4 mile time, Engine (0 = Vshaped, 1 = straight), Transmission (0 = automatic, 1 = manual), Number of forward gears, Number of carburettors. This data set is available in R as “mtcars” under the package datasets. (2) 54 observations on the 10 surgical aspects. This data set is available in R as “SurgicalUnit” under the package ALSM. Answer the following questions for each data sets. (i) Find out appropriate models among all possible subset regression models based on the criteria of adjusted R-square, Mallow’s statistic, AIC and BIC. (ii) Use the forward selection approach to find the appropriate subset regression model. (iii) Use the backward elimination approach to find the appropriate subset regression model. (iv) Use the stepwise selection approach to find the appropriate subset regression model. (v) Comment on the performance of the subset regression models obtained in (i)-(iv).