In: Statistics and Probability
3.5 Drop/remove the insignificant independent variable from the regression model, and develop and show an updated estimated regression equation that can be used to predict the average annual salary for salaried employees given the average annual salary for hourly employees and the size of the company. Again, use the F test and α = 0.05 to test for overall significance. Also use the t test and α = 0.05 to determine the significance of the independent variables in this updated estimated regression equation How much percentage of the variability in y is explained by the updated estimated regression equation? (15 points)
Rank | Company | Size | Salaried ($1000s) | Hourly ($1000s) | Midsize | SmallSize |
4 | Wegmans Food Markets | Large | 56 | 29 | 0 | 0 |
6 | NetApp | Midsize | 143 | 76 | 1 | 0 |
7 | Camden Property Trust | Small | 71 | 37 | 0 | 1 |
8 | Recreational Equipment (REI) | Large | 103 | 28 | 0 | 0 |
10 | Quicken Loans | Midsize | 78 | 54 | 1 | 0 |
11 | Zappos.com | Midsize | 48 | 25 | 1 | 0 |
12 | Mercedes-Benz USA | Small | 118 | 50 | 0 | 1 |
20 | USAA | Large | 96 | 47 | 0 | 0 |
22 | The Container Store | Midsize | 71 | 45 | 1 | 0 |
25 | Ultimate Software | Small | 166 | 56 | 0 | 1 |
37 | Plante Moran | Small | 73 | 45 | 0 | 1 |
42 | Baptist Health South Florida | Large | 126 | 80 | 0 | 0 |
50 | World Wide Technology | Small | 129 | 31 | 0 | 1 |
53 | Methodist Hospital | Large | 100 | 83 | 0 | 0 |
58 | Perkins Coie | Small | 189 | 63 | 0 | 1 |
60 | American Express | Large | 114 | 35 | 0 | 0 |
64 | TDIndustries | Small | 93 | 47 | 0 | 1 |
66 | QuikTrip | Large | 69 | 44 | 0 | 0 |
72 | EOG Resources | Small | 189 | 81 | 0 | 1 |
75 | FactSet Research Systems | Small | 103 | 51 | 0 | 1 |
80 | Stryker | Large | 71 | 43 | 0 | 0 |
81 | SRC | Small | 84 | 33 | 0 | 1 |
84 | Booz Allen Hamilton | Large | 105 | 77 | 0 | 0 |
91 | CarMax | Large | 57 | 34 | 0 | 0 |
93 | GoDaddy.com | Midsize | 105 | 71 | 1 | 0 |
94 | KPMG | Large | 79 | 59 | 0 | 0 |
95 | Navy Federal Credit Union | Midsize | 77 | 39 | 1 | 0 |
97 | Schweitzer Engineering Labs | Small | 99 | 28 | 0 | 1 |
99 | Darden Restaurants | Large | 57 | 24 | 0 | 0 |
100 | Intercontinental Hotels Group | Large | 63 | 26 | 0 | 0 |
From the data it is seen that "Midsize" and "SmallSize" are two dummy variables created from the "Size" variable. Now while fitting regression model, it automatically creates dummy variables for categorical variables, so I am dropping the "Midsize" and "SmallSize" variables to avoid multicollinearity.
So finally fitting the linear regression model where response is "Salaried" and explanatory variables are "Hourly" and "Size". Below is the result of the model summary.
So from the result, it is seen that the F-statistic value is 11.72 and corresponding p-value is 4.817e-05 which is very very less than 0.05. This strongly tells that the model is overall significant.
Moreover, the p-value of the explanatory variables are less than 0.05 informing that both the Hourly and Size variable are significant in predicting Salaried variable.
R2 = 0.5749. Then 57.49% of variability of y is explained by the updated regression Model.