Question

In: Math

Suppose the following data were collected from a sample of 1515 houses relating selling price to...

Suppose the following data were collected from a sample of 1515 houses relating selling price to square footage and the architectural style of the house. Which of the following is the best equation to use relating the selling price of a house to square footage and the style of the house?

Copy Data

Housing Prices
Selling Price Square Footage Colonial (1 if house is Colonial style, 0 otherwise) Ranch (1 if house is Ranch style, 0 otherwise) Victorian (1 if house is Victorian style, 0 otherwise)
391430391430 23032303 00 11 00
381002381002 20532053 11 00 00
403539403539 20132013 00 00 11
405271405271 25522552 00 00 11
406578406578 31313131 00 00 11
471858471858 36593659 00 11 00
392188392188 23322332 00 11 00
475616475616 35883588 11 00 00
401742401742 18431843 00 00 11
404836404836 26562656 11 00 00
333709333709 13371337 11 00 00
393618393618 23892389 11 00 00
365651365651 17991799 00 11 00
404239404239 23212321 00 00 11
375624375624 19461946 00 11 00

Solutions

Expert Solution

Regression analysis is a basic method used in statistical analysis of data. It’s a statistical method which allows estimating the relationships among variables. One needs to identify dependent variable which will vary based on the value of the independent variable. For example, the value of the house (dependent variable) varies based on square feet of the house (independent variable). Regression analysis is very useful tool in predictive analytics.

E(Y | X) = f(X, β)

Y = f(X) = ?0 + ?1 * X

?0 is the intercept of the line

?1 is the slope of the line

Linear regression algorithm is used to predict the relationship(line) among data points. There can be many different (linear or nonlinear) ways to define the relationship. In the linear model, it is based on the intercept and the slope. To find out the most optimal relationship, we need to train the model with the data.

Before applying the linear regression model, we should determine whether or not there is a relationship between the variables of interest. A scatterplot is a good starting point to help in determining the strength of the relationship between two variables. The correlation coefficient is a valuable measure of association between variables. Its value varies between -1 (weak relationship) and 1 (strong relationship).

Once we determine that there is a relationship between variables, next step is to identify best-fitting relationship (line) between the variables. The most common method is the Residual Sum of Squares (RSS). This method calculates the difference between observed data (actual value) and its vertical distance from the proposed best-fitting line (predicted value). It squares each difference and adds all of them.

The MSE (Mean Squared Error) is a quality measure for the estimator by dividing RSS by total observed data points. It is always a non-negative number. Values closer to zero represent a smaller error. The RMSE (Root Mean Squared Error) is the square root of the MSE. The RMSE is a measure of the average deviation of the estimates from the observed values. This is easier to observe compare to MSE, which can be a large number.

RMSE (Square root of MSE) = √ (MSE)

The additional number of variables will add more dimension to the model.

Y = f(X) = ?0 + ?1 * X1 + ?1 * X2 + ?1 * X3



Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value
Regression 3 1.48677E+22 4.95589E+21 19.17 0.000
Square Footage 1 1.47522E+22 1.47522E+22 57.08 0.000
Colonial (1 if house is Colonia 1 1.65815E+20 1.65815E+20 0.64 0.440
Ranch (1 if house is Ranch styl 1 1.12558E+20 1.12558E+20 0.44 0.523
Error 11 2.84317E+21 2.58470E+20
Total 14 1.77108E+22


Model Summary

S R-sq R-sq(adj) R-sq(pred)
1.60770E+10 83.95% 79.57% 65.77%


Coefficients

Term Coef SE Coef T-Value P-Value VIF
Constant 2.85798E+11 17251859972 16.57 0.000
Square Footage 4994 661 7.55 0.000 1.00
Colonial (1 if house is Colonia -740537218 924569907 -0.80 0.440 1.33
Ranch (1 if house is Ranch styl -610158292 924612666 -0.66 0.523 1.33


Regression Equation

Selling Price = 285797688046 + 4994 Square Footage
- 740537218 Colonial (1 if house is Colonia
- 610158292 Ranch (1 if house is Ranch styl


Fits and Diagnostics for Unusual Observations

Obs Selling Price Fit Resid Std Resid
5 4.06578E+11 4.42185E+11 -3.56063E+10 -2.64 R


Related Solutions

Suppose the following data were collected from a sample of 1515 CEOs relating annual salary to...
Suppose the following data were collected from a sample of 1515 CEOs relating annual salary to years of experience and the economic sector their company belongs to. Use statistical software to find the following regression equation: SALARYi=b0+b1EXPERIENCEi+b2SERVICEi+b3INDUSTRIALi+eiSALARYi=b0+b1EXPERIENCEi+b2SERVICEi+b3INDUSTRIALi+ei. Is there enough evidence to support the claim that on average, CEOs in the service sector have lower salaries than CEOs in the financial sector at the 0.050.05 level of significance? If yes, write the regression equation in the spaces provided with answers...
uppose the following data were collected from a sample of 1515 CEOs relating annual salary to...
uppose the following data were collected from a sample of 1515 CEOs relating annual salary to years of experience and the economic sector their company belongs to. Use statistical software to find the following regression equation: SALARYi=b0+b1EXPERIENCEi+b2SERVICEi+b3INDUSTRIALi+eiSALARYi=b0+b1EXPERIENCEi+b2SERVICEi+b3INDUSTRIALi+ei. Is there enough evidence to support the claim that on average, CEOs in the service sector have lower salaries than CEOs in the financial sector at the 0.050.05 level of significance? If yes, write the regression equation in the spaces provided with answers...
Suppose the following data were collected relating the selling price of a house to square footage...
Suppose the following data were collected relating the selling price of a house to square footage and whether or not the house is made out of brick. Use statistical software to find the regression equation. Is there enough evidence to support the claim that on average brick houses are more expensive than other types of houses at the 0.050.05 level of significance? If yes, type the regression equation in the spaces provided with answers rounded to two decimal places. Else,...
Suppose the following data were collected relating the selling price of a house to square footage...
Suppose the following data were collected relating the selling price of a house to square footage and whether or not the house is made out of wood. Use statistical software to find the regression equation. Is there enough evidence to support the claim that on average wood houses are more expensive than other types of houses at the 0.05 level of significance? If yes, type the regression equation in the spaces provided with answers rounded to two decimal places. Else,...
Suppose the following data were collected relating the selling price of a house to square footage...
Suppose the following data were collected relating the selling price of a house to square footage and whether or not the house is made out of wood. Use statistical software to find the regression equation. Is there enough evidence to support the claim that on average wood houses are more expensive than other types of houses at the 0.05 level of significance? If yes, type the regression equation in the spaces provided with answers rounded to two decimal places. Else,...
Suppose the following data were collected from a sample of 15 CEOs relating annual salary to...
Suppose the following data were collected from a sample of 15 CEOs relating annual salary to years of experience and the economic sector their company belongs to. Use statistical software to find the following regression equation: SALARYi=b0+b1EXPERIENCEi+b2SERVICEi+b3INDUSTRIALi+ei Is there enough evidence to support the claim that on average, CEOs in the service sector have lower salaries than CEOs in the financial sector at the 0.01 level of significance? If yes, write the regression equation in the spaces provided with answers...
Suppose the following data were collected from a sample of 15 CEOs relating annual salary to...
Suppose the following data were collected from a sample of 15 CEOs relating annual salary to years of experience and the economic sector their company belongs to. Use statistical software to find the following regression equation: SALARYi=b0+b1EXPERIENCEi+b2SERVICEi+b3INDUSTRIALi+eiSALARYi=b0+b1EXPERIENCEi+b2SERVICEi+b3INDUSTRIALi+ei. Is there enough evidence to support the claim that on average, CEOs in the service sector have lower salaries than CEOs in the financial sector at the 0.05 level of significance? If yes, write the regression equation in the spaces provided with answers...
Suppose the following data were collected from a sample of 5 car manufacturers relating monthly car...
Suppose the following data were collected from a sample of 5 car manufacturers relating monthly car sales to the number of dealerships and the quarter of the year. Use statistical software to find the following regression equation: SALESi=b0+b1DEALERSHIPSi+b2QUARTER1i+b3QUARTER2i+b4QUARTER3i+ei Is there enough evidence to support the claim that on average, car sales are higher in the 4th quarter than in the 3rd quarter at the 0.05 level of significance? If yes, write the regression equation in the spaces provided, rounded to...
Suppose the following data were collected relating the total number of crimes committed to the number...
Suppose the following data were collected relating the total number of crimes committed to the number of police officers and if the town is located in the Southwest. Use statistical software to find the regression equation. Is there enough evidence to support the claim that on average there are less crimes in the Southwest than in other regions of the country at the 0.050.05 level of significance? If yes, type the regression equation in the spaces provided with answers rounded...
Suppose the following data were collected relating CEO salary to years of experience and gender. Use...
Suppose the following data were collected relating CEO salary to years of experience and gender. Use statistical software to find the regression equation. Is there enough evidence to support the claim that on average female CEOs have higher salaries than male CEOs at the 0.01 level of significance? If yes, type the regression equation in the spaces provided with answers rounded to two decimal places. Else, select "There is not enough evidence." Salary Experience Female (1 if female, 0 if...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT