In: Statistics and Probability
Use the HousePrice data and via multiple regression select the two variables that predict the house selling price the best. Make another table with these two variables and answer the questions. Numerical answers are rounded so choose the answer that matches the best:
9. Identify the negative coefficient. What is its value and what is the interpretation of this number? (Choose the most appropriate answer. Note: numbers are truncated.)
10. Which of the two variables has better P-value and what is this P-value? (Note: numbers are truncated.)
11. Using this second table predict the selling price of a housethat is 10 years old, has 2 bathrooms and 3 rooms.
12. Based on the table would you characterize the Regression fit and the prediction as Poor, Good, Very Good, or Excellent?
Age #Bathrooms #Rooms #BedRooms #FirePlaces sellingPrice in $100000 42 1 7 4 0 4.9176 62 1 7 4 0 5.0208 40 1 6 3 0 4.5429 54 1 6 3 0 4.5573 42 1 6 3 0 5.0597 56 1 6 3 0 3.891 51 1 7 3 1 5.898 32 1 6 3 0 5.6039 32 1 6 3 0 5.8282 30 1 6 3 0 5.3003 30 1 5 2 0 6.2712 32 1 6 3 0 5.9592 32 1 6 3 0 5.6039 50 1.5 8 4 0 8.2464 17 1.5 6 3 0 7.7841 23 1 7 3 0 9.0384 22 1.5 6 3 0 7.5422 44 1.5 6 3 0 6.0931 3 1 7 3 0 8.14 31 1.5 8 4 0 9.1416 42 2.5 10 5 1 16.4202 14 2.5 9 5 1 14.4598 46 1 5 2 1 5.05 22 1.5 7 3 1 6.6969 40 1 6 3 1 5.9894 50 1.5 8 4 1 8.7951 48 1.5 8 4 1 8.3607 30 1.5 6 3 1 12
Regression with all the variables.
Put the data in excel and using the regression under data analysis
tab, we input the values
The output is as follows.
Regression with the best two variable
We select the variable based on the pvalue.
For each beta coefficient we test the following hypothesis.
Next we check the pvalue for the variable in the regression output and check if the pvalue is less than 0.05, if it is less than 0.05, then we reject the null hypothesis and conclude that the variable is significant.
We find that only Age and #Bathroom are significant with a
pvalue less than 0.05
Hence only these two are select in the next model.
Screenshots of the inputs and the output are shown below.
9. Identify the negative coefficient. What is its value and
what is the interpretation of this number? (Choose the most
appropriate answer. Note: numbers are truncated.)
b. -0.037; This is how much the selling price of the house will
decrease in hundred thousand as the house age by a year.
10. Which of the two variables has better P-value and
what is this P-value? (Note: numbers are truncated.)
c. #Bathroooms has the best P-value; P-value = 1.08E-09
11. Using this second table predict the selling price of
a housethat is 10 years old, has 2 bathrooms and 3
rooms.
b. 12.57 hundred thousands
Explaination :
Regression equation
selling price = 0.98872665-0.0369(Age)+ 5.9745(#Bathrooms)
selling price = 0.98872665-0.0369(10)+ 5.9745(2) =12.5684
12. Based on the table would you characterize the
Regression fit and the prediction as Poor, Good, Very Good, or
Excellent?
c. Good
Fit of the regression line is determined by the coefficient of
determination.
Coefficient of determination(rsqaure) = 0.805635082
It is the measure of the amount of variability in y explained by x. Its value lies between 0 and 1. Greater the value, better is the model. In this case, it 80.56%, hence the model is good