Question

In: Statistics and Probability

Price (in K) Sqft Age Features CornerCODE Corner_Label 310.0 2650 13 7 0 NO 313.0 2600...

Price (in K) Sqft Age Features CornerCODE Corner_Label
310.0 2650 13 7 0 NO
313.0 2600 9 4 0 NO
320.0 2664 6 5 0 NO
320.0 2921 3 6 0 NO
304.9 2580 4 4 0 NO
295.0 2580 4 4 0 NO
285.0 2774 2 4 0 NO
261.0 1920 1 5 0 NO
250.0 2150 2 4 0 NO
249.9 1710 1 3 0 NO
242.5 1837 4 5 0 NO
232.0 1880 8 6 0 NO
230.0 2150 15 3 0 NO
228.5 1894 14 5 0 NO
222.0 1928 18 8 0 NO
223.0 1830 16 3 0 NO
220.5 1767 16 4 0 NO
216.0 1630 15 3 1 YES
218.9 1680 17 4 1 YES
204.5 1725 13 3 0 NO
204.5 1500 15 4 0 NO
202.5 1430 10 3 0 NO
202.5 1360 12 4 0 NO
195.0 1400 16 2 1 YES
201.0 1573 17 6 0 NO
191.0 1385 22 2 0 NO
274.5 2931 28 3 1 YES
260.3 2200 28 4 0 NO
230.0 2277 30 4 0 NO
235.0 2000 37 3 0 NO
207.0 1478 53 3 1 YES
207.0 1713 30 4 1 YES
197.2 1326 25 4 0 NO
197.5 1050 22 2 1 YES
194.9 1464 34 2 0 NO
190.0 1190 41 1 0 NO
192.6 1156 37 1 0 NO
194.0 1746 30 2 0 NO
192.0 1280 28 1 0 NO
175.0 1215 43 3 0 NO
177.0 1121 46 4 0 NO
177.0 1050 48 1 0 NO
179.9 1733 43 6 0 NO
178.1 1299 40 6 0 NO
177.5 1140 36 3 1 YES
172.0 1181 37 4 0 NO
320.0 2848 4 6 0 NO
264.9 2440 11 5 0 NO
240.0 2253 23 4 0 NO
234.9 2743 25 5 1 YES
230.0 2180 17 4 1 YES
228.9 1706 14 4 0 NO
225.0 1948 10 4 0 NO
217.5 1710 16 4 0 NO
215.0 1657 15 4 0 NO
213.0 2200 26 4 0 NO
210.0 1680 13 4 0 NO
209.9 1900 34 3 0 NO
200.5 1565 19 3 0 NO
198.4 1543 20 3 0 NO
192.5 1173 6 4 0 NO
193.9 1549 5 4 0 NO
190.5 1900 3 3 0 NO
188.5 1560 8 5 1 YES
186.0 1365 10 2 0 NO
185.5 1258 7 4 1 YES
184.9 1314 5 2 0 NO
180.0 1338 2 3 1 YES
180.9 997 4 4 0 NO
180.5 1275 8 5 0 NO
180.0 1030 4 1 0 NO
178.0 1027 5 3 0 NO
177.9 1007 19 6 0 NO
176.0 1083 22 4 0 NO
182.3 1320 18 5 0 NO
174.0 1348 15 2 0 NO
172.0 1350 12 2 0 NO
166.9 837 13 2 0 NO
234.5 3750 10 4 1 YES
202.5 1500 7 3 1 YES
198.9 1428 40 2 0 NO
187.0 1375 28 1 0 NO
183.0 1080 20 3 0 NO
182.0 900 23 3 0 NO
175.0 1505 16 2 1 YES
167.0 1480 19 4 0 NO
159.0 1142 10 0 0 NO
212.0 1464 7 2 0 NO
315.0 2116 25 3 0 NO
177.5 1280 14 3 0 NO
171.0 1159 23 0 0 NO
165.0 1198 10 4 0 NO
163.0 1051 15 2 0 NO
289.4 2250 40 6 0 NO
263.0 2563 17 2 0 NO
174.9 1400 45 1 1 YES
238.0 1850 5 5 1 YES
221.0 1720 5 4 0 NO
215.9 1740 4 3 0 NO
217.9 1700 6 4 0 NO
210.0 1620 6 4 0 NO
209.5 1630 6 4 0 NO
210.0 1920 8 4 0 NO
207.0 1606 5 4 0 NO
205.0 1535 7 5 1 YES
208.0 1540 6 2 1 YES
202.5 1739 13 3 0 NO
200.0 1715 8 3 0 NO
199.0 1305 5 3 0 NO
197.0 1415 7 4 0 NO
199.5 1580 9 3 0 NO
192.4 1236 3 4 0 NO
192.2 1229 6 3 0 NO
192.0 1273 4 4 0 NO
191.9 1165 7 4 0 NO
181.6 1200 7 4 1 YES
178.9 970 4 4 1 YES

Multiple Regression Modeling Steps

  1. Open the Excel worksheet containing your Team Project Data.
  2. As you learned in Modules 3 and 4, you will be using the set of potentially meaningful numerical independent variables and the one selected “two-category” dummy variable in your study to develop a “best” multiple regression model for predicting your numerical response variable Y. Follow the step by step modeling process described in the PowerPoints at the end of Module 4.
    1. Start with a visual assessment of the possible relationships of your numerical dependent variable Y with each potential predictor variable by developing the scatterplot matrix (use JMP) and paste this into your report.
    2. Then fit a preliminary multiple regression model using these potential numerical predictor variables and, at most, one categorical dummy variable.
    3. Then assess collinearity with VIF until you are satisfied that you have a final set of possible predictors that are “independent,” i.e., not unduly correlated with each other.
    4. Use stepwise regression approaches to fit a multiple regression model with this set of potentially meaningful numerical independent variables (and, if appropriate, the one selected categorical dummy variable).
      1. (1) Based on the forward modeling criterion determine which independent variables should be included in your regression model.
      2. (2) Based on the backward selection modeling criterion determine which independent variables should be included in your regression model.
      3. (3) Based on the mixed selection modeling criterion determine which independent variables should be included in your regression model.
      4. (4) Based on the Adjusted r2 criterion determine which independent variables should be included in your regression model.
    5. Comment on the consistency of your findings in Step 2D (1)-(4).
    6. Paste screenshots of (1), (2), and (3) outputs from Step 2D above into your report.
    7. Based on Step 2D (along with the principle of parsimony if necessary) select a “best”multiple regression model.
    8. Using the predictor variables from your selected “best” multiple regression model, rerun the multiple regression model in order to assess its assumptions. You may use Excel or JMP for this step.
    9. Look at the set of residual plots, cut and paste them into the report, and briefly comment on the appropriateness of your fitted model.
      1. (1) If the assumptions are met and the fitted model is appropriate, continue to Step 2J.
      2. (2) If the normality assumption is problematic, state this but continue to Step 2J with caution because your sample size is large enough for the central limit theorem to enable the use of classical inferential methods. Note: You do not need to check the assumption of independence in your project. That assumption is met because your project is not time-dependent.
      3. (3) If either the linearity or equality of variance assumption is violated in one or two scatter plots of Y with individual predictors then transform the particular independent variables involved following Tukey’s “ladder of powers” and rerun the multiple regression model as in Step 2H.
    10. Assess the significance of the overall fitted model.
    11. Assess the significance of each predictor variable.
  3. Write the sample multiple regression equation for the “final best” model you have developed.
    1. Interpret the meaning of the Y intercept and interpret the meaning of all the slopes for your fitted model (but do this in whatever units you used for Y to build this model).
    2. Interpret the meaning of the coefficient of multiple determination r 2 .
    3. Interpret the meaning of the standard error of the estimate SYX (in the units you used to build this model).
    4. Determine the 95% confidence interval estimate of the average value of Y for all occasions when the independent variables have the values you selected.
    5. Select one value for each of your independent variables in their respective relevant ranges:
    6. Predict

Solutions

Expert Solution

Answer:

      1. 2) If the normality assumption is problematic, state this but continue to Step 2J with caution because your sample size is large enough for the central limit theorem to enable the use of classical inferential methods. Note: You do not need to check the assumption of independence in your project. That assumption is met because your project is not time-dependent.
      2. (3) If either the linearity or equality of variance assumption is violated in one or two scatter plots of Y with individual predictors then transform the particular independent variables involved following Tukey’s “ladder of powers” and rerun the multiple regression model as in Step 2H.
    1. Assess the significance of the overall fitted model.
    2. Assess the significance of each predictor variable.
  1. Write the sample multiple regression equation for the “final best” model you have developed.
    1. Interpret the meaning of the Y intercept and interpret the meaning of all the slopes for your fitted model (but do this in whatever units you used for Y to build this model).
    2. Interpret the meaning of the coefficient of multiple determination r2 .
    3. Interpret the meaning of the standard error of the estimate SYX (in the units you used to build this model).
    4. Determine the 95% confidence interval estimate of the average value of Y for all occasions when the independent variables have the values you selected.
    5. Select one value for each of your independent variables in their respective relevant ranges:

Related Solutions

Price (in K) Sqft Age Features CornerCODE Corner_Label 310.0 2650 13 7 0 NO 313.0 2600...
Price (in K) Sqft Age Features CornerCODE Corner_Label 310.0 2650 13 7 0 NO 313.0 2600 9 4 0 NO 320.0 2664 6 5 0 NO 320.0 2921 3 6 0 NO 304.9 2580 4 4 0 NO 295.0 2580 4 4 0 NO 285.0 2774 2 4 0 NO 261.0 1920 1 5 0 NO 250.0 2150 2 4 0 NO 249.9 1710 1 3 0 NO 242.5 1837 4 5 0 NO 232.0 1880 8 6 0 NO...
Price (in K) Sqft 310.0 2650 313.0 2600 320.0 2664 320.0 2921 304.9 2580 295.0 2580...
Price (in K) Sqft 310.0 2650 313.0 2600 320.0 2664 320.0 2921 304.9 2580 295.0 2580 285.0 2774 261.0 1920 250.0 2150 249.9 1710 242.5 1837 232.0 1880 230.0 2150 228.5 1894 222.0 1928 223.0 1830 220.5 1767 216.0 1630 218.9 1680 204.5 1725 204.5 1500 202.5 1430 202.5 1360 195.0 1400 201.0 1573 191.0 1385 274.5 2931 260.3 2200 230.0 2277 235.0 2000 207.0 1478 207.0 1713 197.2 1326 197.5 1050 194.9 1464 190.0 1190 192.6 1156 194.0 1746...
Price (in K) Sqft 310.0 2650 313.0 2600 320.0 2664 320.0 2921 304.9 2580 295.0 2580...
Price (in K) Sqft 310.0 2650 313.0 2600 320.0 2664 320.0 2921 304.9 2580 295.0 2580 285.0 2774 261.0 1920 250.0 2150 249.9 1710 242.5 1837 232.0 1880 230.0 2150 228.5 1894 222.0 1928 223.0 1830 220.5 1767 216.0 1630 218.9 1680 204.5 1725 204.5 1500 202.5 1430 202.5 1360 195.0 1400 201.0 1573 191.0 1385 274.5 2931 260.3 2200 230.0 2277 235.0 2000 207.0 1478 207.0 1713 197.2 1326 197.5 1050 194.9 1464 190.0 1190 192.6 1156 194.0 1746...
HouseID Age(years) Size(sqft) Bedrooms Price($1,000) 1 7 1580 3 132.0 2 8 1744 3 123.9 3...
HouseID Age(years) Size(sqft) Bedrooms Price($1,000) 1 7 1580 3 132.0 2 8 1744 3 123.9 3 5 1863 3 159.1 4 7 1824 4 126.0 5 3 1924 4 128.3 6 10 1707 4 145.4 7 5 1898 3 126.1 8 5 2063 4 128.4 9 7 1641 3 147.4 10 7 1524 3 121.5 11 4 2144 3 167.7 12 9 1518 2 109.2 13 8 1645 3 132.6 14 5 2741 4 212.3 15 6 1718 3 111.8...
X = [ 6, 0, 13, 10, 7, 16, 13 ] (a) mode (b) median (c)...
X = [ 6, 0, 13, 10, 7, 16, 13 ] (a) mode (b) median (c) mean (d) 20% trimmed mean (e) range (f) interquartile range (g) sample standard deviation (h) winsorized sample standard deviation
6 5 4 5 0 0 13 48 6 1 0 7 2 0 1 1...
6 5 4 5 0 0 13 48 6 1 0 7 2 0 1 1 0 2 11 5 11 27 4 0 6 Create Standard Deviation Chart (Normal Distribution Curve)
7- What is the price of a bond with the following features? Face Value = $1,000...
7- What is the price of a bond with the following features? Face Value = $1,000 Coupon Rate = 3% (stated as an ANNUAL rate) Semiannual coupon payments Maturity = 6 years YTM = 5.2% (Stated as an APR) State your answer to the nearest penny (e.g., 984.25)
Principles of Marketing, 15th Edition, Philip T Kotler, Gary Armstrong ISBN-10: 0-13-308404-3 ISBN-13: 978-0-13-308404-7 I need...
Principles of Marketing, 15th Edition, Philip T Kotler, Gary Armstrong ISBN-10: 0-13-308404-3 ISBN-13: 978-0-13-308404-7 I need the answer to this book. 1. Expalin why brand equity is important to the seller. Does ESPN have strong brand equity? How does its brand equity relate to its brand value?
Price Rooms Neighborhood 309.6 7 0 307.4 8 0 340.3 9 0 346.5 12 0 298.2...
Price Rooms Neighborhood 309.6 7 0 307.4 8 0 340.3 9 0 346.5 12 0 298.2 6 0 337.8 9 0 324.1 10 0 313.2 8 0 327.8 9 0 325.3 8 0 308.5 6 1 381.3 13 1 337.4 10 1 346.2 10 1 342.4 9 1 323.7 8 1 329.6 8 1 343.6 9 1 360.7 11 1 348.3 9 1 Predict the selling price for a house with nine rooms that is located in the east-side neighborhood....
Data: 7,-5, -8, 7, 9, 15, 0, 2, 13, 8, 6, -2, 4 (a) Mean= Mode=...
Data: 7,-5, -8, 7, 9, 15, 0, 2, 13, 8, 6, -2, 4 (a) Mean= Mode= median= (b) Variance= Standard deviation= (c) Range= IQR(Interquartilerange)= (d) Mid-Range= Mid-Hinge=
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT