In: Statistics and Probability
1.Develop a multiple linear regression model to predict the price of a house using the square feet of living area, number of bedrooms, and number of bathrooms as the predictor variables
Prepare a single Microsoft Excel file to document your regression analyses. Prepare a single Microsoft Word document that outlines your responses for each portion of the case study.
Selling Price Living Area (Sq Feet) No. Bathrooms No Bedrooms Age (Years)
$240,000 2,022 2.5 3 20
$235,000 1,578 2 3 20
$500,075 3,400 3 3 20
$240,000 1,744 2.5 3 20
$270,000 2,560 2.5 3 20
$225,000 1,398 2.5 3 20
$280,000 2,494 2.5 3 20
$225,000 2,208 2.5 4 20
$248,220 2,550 2.5 3 20
$275,000 1,812 2.5 2 20
$137,000 1,290 1 2 20
$150,000 1,172 2 2 20
$649,000 4,128 3.5 3 20
$195,000 1,816 2.5 3 97
$373,200 2,628 2.5 4 20
$169,450 1,254 2.5 3 20
$144,200 1,660 1.5 4 20
$189,900 1,850 1.5 3 20
$166,000 1,258 2 3 20
$160,000 1,219 2 3 20
$327,355 1,850 2.5 3 20
$247,000 2,103 2.5 3 20
$318,000 1,806 2.5 3 20
$341,000 1,674 1.5 2 17
$288,650 2,242 2.5 3 20
$157,000 1,408 1.5 3 20
$449,000 3,457 2.5 3 21
$142,000 1,728 1.5 3 21
$389,000 2,354 2.5 3 21
$476,000 2,246 2.5 3 21
$249,230 1,902 2.5 2 21
$139,900 1,178 1 3 21
$301,900 2,896 3.5 4 21
$425,000 2,457 3 3 41
$121,000 936 1 3 50
$150,000 934 1 2 21
$138,000 1,279 1 3 21
$199,900 1,888 2 3 26
$145,000 1,686 1.5 4 21
$465,000 2,310 3 2 21
$158,000 1,200 1.5 3 21
This is a simple problem related ot tracing the multi variable regression equation and to check its significance.
The data set for the given regression is
Selling price($) | Living Area (sq Ft) | Bathroom(No.) | Bedroom(No.) | Age (years) |
240000 | 2022 | 2.5 | 3 | 20 |
235000 | 1578 | 2 | 3 | 20 |
500075 | 3400 | 3 | 3 | 20 |
240000 | 1744 | 2.5 | 3 | 20 |
270000 | 2560 | 2.5 | 3 | 20 |
225000 | 1398 | 2.5 | 3 | 20 |
280000 | 2494 | 2.5 | 3 | 20 |
225000 | 2208 | 2.5 | 4 | 20 |
248220 | 2550 | 2.5 | 3 | 20 |
275000 | 1812 | 2.5 | 2 | 20 |
137000 | 1290 | 1 | 2 | 20 |
150000 | 1172 | 2 | 2 | 20 |
649000 | 4128 | 3.5 | 3 | 20 |
195000 | 1816 | 2.5 | 3 | 97 |
373200 | 2628 | 2.5 | 4 | 20 |
169450 | 1254 | 2.5 | 3 | 20 |
144200 | 1660 | 1.5 | 4 | 20 |
189900 | 1850 | 1.5 | 3 | 20 |
166000 | 1258 | 2 | 3 | 20 |
160000 | 1219 | 2 | 3 | 20 |
327355 | 1850 | 2.5 | 3 | 20 |
247000 | 2103 | 2.5 | 3 | 20 |
318000 | 1806 | 2.5 | 3 | 20 |
341000 | 1674 | 1.5 | 2 | 17 |
288650 | 2242 | 2.5 | 3 | 20 |
157000 | 1408 | 1.5 | 3 | 20 |
449000 | 3457 | 2.5 | 3 | 21 |
142000 | 1728 | 1.5 | 3 | 21 |
389000 | 2354 | 2.5 | 3 | 21 |
476000 | 2246 | 2.5 | 3 | 21 |
249230 | 1902 | 2.5 | 2 | 21 |
139900 | 1178 | 1 | 3 | 21 |
301900 | 2896 | 3.5 | 4 | 21 |
425000 | 2457 | 3 | 3 | 41 |
121000 | 936 | 1 | 3 | 50 |
150000 | 934 | 1 | 2 | 21 |
138000 | 1279 | 1 | 3 | 21 |
199900 | 1888 | 2 | 3 | 26 |
145000 | 1686 | 1.5 | 4 | 21 |
465000 | 2310 | 3 | 2 | 21 |
158000 | 1200 | 1.5 | 3 | 21 |
Now as per the question we need to plot the multi variable regression equation only in terms of three given independent variables
Area of the house
Number of bedrooms
Number of bathrooms
So we plot the regression equation in the MS-Excel by recalling the function regression in the data analysis function.
This gives us the following output.
Now what do we infer from the given regression analysis ?
We infer that :-
The regression equation is given by
Y(Selling price)=84829.70+130.84*(living Area)+38670.33*(Bathroom numbers)-54806*(Bedroom numbers)
1). Since the F test significance value is <0.05, we can say that the overall multi variable regression equation is significant with 95% confidence .
2). Since the p values of the intercept of the equation and the variable (Bathroom numbers) is >0.05 we can infer that these factors do NOT hold significance in the regression equation .
3). Since the p values of the independent variables Living area (in sq ft). and the variable (Bedroom numbers) is <0.05 ,we can infer that these factors do hold significance in the regression equation .
4). coefficients of the variables Bathroom numbers, living area and the intercept are positive indicating that selling price is directly dependnent on them ehile the coefficient of bedroom number is negative which means that as the number of bedrooms increase the selling prices arel ikely to reduce.
5). Now to understand the percentage of variation of output that is explained by the independent input variables we look for the value of R-Squared and Adjusted R- Squared. The R-Squared tells us as to how well the model or regression line “fits” the data.
It tell us about the proportion of variance in the dependent variable (Y) that is explained by the independent variable (X). The R-Squared indicates the percentage of variation in the dependent variable that is explained by the independent variables.
But our regression equation is a multi variable regression equation . so we will seek adjusted R -Squared ?
But Why?
Because Adjusted R-Squared is used when analyzing multiple regression output because When we have more than one independent variable in our analysis, the computation process inflates the R-squared. As the name indicates, the Adjusted R-Squared is the R-Square adjusted for this inflation when performing multiple regression.
Now look at the Adjusted R -Squared value? what is it?
Adjusted R -Squared =0.7756
This means that only 77.56% of all output variations can be explained by our regression model.
Prediction interval
We can predict the selling price by using our regression model
For living space =3000 sq ft
Bedroom =3 number
bathroom =2.5 numbers
The predicted selling price will be
Y(Selling price)=84829.70+130.84*(3000)+38670.33*(2.5)-54806*(3)
=$409607