Question

In: Statistics and Probability

rent rooms baths sqrfoot house campusclose pets new 875 1 1 655 0 0 0 0...

rent rooms baths sqrfoot house campusclose pets new
875 1 1 655 0 0 0 0
1130 1 1 800 0 0 1 0
785 1 1 650 0 1 0 0
895 1 1 566 0 1 0 0
690 1 1 600 0 1 0 0
800 1 1 435 0 1 0 0
595 1 1 500 0 0 0 0
850 1 1 655 0 1 0 0
775 1 1 612 0 0 1 1
795 1 1 688 0 1 0 0
1050 1 1 700 0 1 1 0
870 1 1 655 0 1 0 0
1070 1 1 710 0 0 1 0
850 1 1 670 0 1 0 0
825 1 1 488 0 1 0 0
1300 2 1 781 1 1 1 0
1225 2 1 764 0 1 0 0
1300 2 1 800 1 0 0 0
1200 2 1 922 0 1 0 0
1345 2 1 856 0 0 1 1
1100 2 2 866 0 0 0 1
1350 2 2 1300 0 0 0 0
1450 2 1 700 0 1 1 1
1200 2 1 800 0 1 0 0
1195 2 1 795 0 1 0 0
1185 2 1 864 0 1 0 0
1100 2 1 1050 0 1 0 0
1125 2 2 986 0 0 1 1
1075 2 1 800 0 0 1 1
1210 2 2 890 0 1 0 0
1150 2 1 1200 0 0 1 0
1215 2 1 988 0 1 0 0
1270 2 1.5 995 0 1 0 0
995 2 1 864 0 1 0 0
1095 2 1 1050 0 0 0 0
995 2 1 800 0 1 0 0
1205 2 1 900 1 1 1 0
1560 3 2 1200 1 1 0 0
1800 3 2.5 1309 1 0 0 1
1740 3 1 1200 1 1 0 0
1795 3 2 1300 0 0 0 0
2067 3 4 1700 0 1 0 1
2695 3 2.5 1551 0 0 1 1
1815 3 2 1467 0 0 1 0
1900 3 2.5 1600 1 0 0 0
1395 3 2 1611 1 0 1 0
1194 3 1 1705 1 1 0 0
1699 3 3 1646 1 1 1 0
1700 3 2 1550 1 0 1 0
2700 4 3 2100 1 0 1 1
2956 4 4 1659 0 1 1 1
2400 4 2 2300 1 1 0 0
2250 4 2 1900 0 1 0 0
2099 4 4 2200 1 1 1 0
2720 4 3 2400 0 1 0 1
1700 4 1.5 1980 1 1 0 0
2200 4 1.5 2100 1 1 0 0
2600 5 1.5 3500 1 1 0 0
2600 5 2 1607 1 0 0 0
2300 5 2 2600 1 0 0 0

1.Remove the variable with highest p-value and re-fit the model. Only remove one variable at a time.

2. Continue removing variables one-by-one until all variables in the model have a p-value less than 0.05.

3. Consider whether any of the variables in your model are related to each other. Check this with the scatterplot matrix and\or by finding the correlation between the two explanatory variables. If r <= 0.80 then keep both variables in the model. This is your final model. However If r > 0.80, then one of the variables should be removed from the model. Re-fit two models, each model without one of the correlated variables. Select the model with the higher adjusted R-squared value.

a. (2 points) Provide a narrative for how you settled upon the final model. Example: “I first fit the full model and noticed the p-value for ____was very high. I dropped it from the model and refit the data, then I check the correlation between ___ and ___ to see if the relationship was too strong between the explanatory variables.”

b.(2 points) Provide the R output of your final model.

c. (2 points) State the least squares regression equation of your model.

d. (2 points) Compare the adjusted R- squared values from the full model to your final model. Is there much of a difference? What does this comparison tell us about the fit of two models?

Solutions

Expert Solution

1.

Loaded the data into a dataframe (rooms) and ran the regression in R with below command and output.

> model1 = lm(rent ~ ., data = rooms)
> summary(model1)

Call:
lm(formula = rent ~ ., data = rooms)

Residuals:
Min 1Q Median 3Q Max
-409.50556 -124.29555 12.24494 129.78501 639.47952

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 148.17524969 84.81491593 1.74704 0.0865341 .
rooms 408.69587824 66.75578899 6.12225 1.2337e-07 ***
baths 131.69500141 48.81992791 2.69757 0.0093944 **
sqrfoot 0.08225052 0.11247639 0.73127 0.4678973
house -146.45871377 82.30078617 -1.77955 0.0809932 .
campusclose 31.01998932 61.00682154 0.50847 0.6132760
pets 101.69598374 68.04734480 1.49449 0.1410920
new 122.75355017 87.42624629 1.40408 0.1662393
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 210.4259 on 52 degrees of freedom
Multiple R-squared: 0.8926609,   Adjusted R-squared: 0.8782114
F-statistic: 61.77799 on 7 and 52 DF, p-value: < 2.2204e-16

The highest p-value is for campusclose.

2.

Removing the variable campusclose and running the regression again we get,

> model2 = lm(rent ~ rooms + baths + sqrfoot + house + pets + new, data = rooms)
> summary(model2)

Call:
lm(formula = rent ~ rooms + baths + sqrfoot + house + pets +
new, data = rooms)

Residuals:
Min 1Q Median 3Q Max
-417.16233 -116.79030 13.24609 115.20607 631.12915

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 170.69022050 71.83063524 2.37629 0.0211332 *
rooms 407.48249621 66.24482626 6.15116 1.0396e-07 ***
baths 133.39153878 48.36388388 2.75808 0.0079607 **
sqrfoot 0.08412433 0.11162689 0.75362 0.4544121
house -149.16521870 81.55197063 -1.82908 0.0730158 .
pets 92.87526516 65.33705783 1.42148 0.1610353
new 113.90218629 85.07422097 1.33886 0.1863319
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The highest p-value is for sqrfoot.

Removing the variable sqrfoot and running the regression again we get,

> model3 = lm(rent ~ rooms + baths + house + pets + new, data = rooms)
> summary(model3)

Call:
lm(formula = rent ~ rooms + baths + house + pets + new, data = rooms)

Residuals:
Min 1Q Median 3Q Max
-416.14059 -114.07137 7.00428 123.27113 637.05859

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 169.58277 71.52773 2.37087 0.0213447 *
rooms 447.52688 39.40079 11.35832 6.2257e-16 ***
baths 138.54788 47.68554 2.90545 0.0053062 **
house -150.09609 81.21575 -1.84812 0.0700650 .
pets 95.33839 64.99368 1.46689 0.1482068
new 104.06992 83.73088 1.24291 0.2192710
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 208.1112 on 54 degrees of freedom
Multiple R-squared: 0.8909712,   Adjusted R-squared: 0.880876
F-statistic: 88.25643 on 5 and 54 DF, p-value: < 2.2204e-16

The highest p-value is for new.

Removing the variable new and running the regression again we get,

> model4 = lm(rent ~ rooms + baths + house + pets , data = rooms)
> summary(model4)

Call:
lm(formula = rent ~ rooms + baths + house + pets, data = rooms)

Residuals:
Min 1Q Median 3Q Max
-432.02712 -121.34304 -5.14514 119.74665 668.79221

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 150.31946 70.17344 2.14211 0.0366277 *
rooms 452.39178 39.39960 11.48214 3.1402e-16 ***
baths 157.54211 45.39362 3.47058 0.0010179 **
house -183.88561 76.90869 -2.39096 0.0202537 *
pets 124.85771 60.79773 2.05366 0.0447754 *

Now all varibles have p-value less than 0.05.

3.

The variables in the above model are rooms, baths, house and pets

Running the correlation on rooms dataframe, we get

> cor(rooms)
rent rooms baths sqrfoot house campusclose
rent 1.00000000000 0.908503713359 0.73964557320 0.84917696737 0.46943266938 -0.07598738507
rooms 0.90850371336 1.000000000000 0.63383078116 0.91693149229 0.62979395469 -0.04712222866
baths 0.73964557320 0.633830781157 1.00000000000 0.61000906432 0.28883391554 -0.09561758715
sqrfoot 0.84917696737 0.916931492295 0.61000906432 1.00000000000 0.58366829910 -0.02261255078
house 0.46943266938 0.629793954692 0.28883391554 0.58366829910 1.00000000000 -0.05281228359
campusclose -0.07598738507 -0.047122228656 -0.09561758715 -0.02261255078 -0.05281228359 1.00000000000
pets 0.13039491214 0.001047910074 0.20097569028 0.01015951996 0.07573812580 -0.34757851760
new 0.30682539073 0.131614960629 0.41380424027 0.08326810774 -0.16122923188 -0.29137624733
pets new
rent 0.130394912142 0.30682539073
rooms 0.001047910074 0.13161496063
baths 0.200975690280 0.41380424027
sqrfoot 0.010159519955 0.08326810774
house 0.075738125802 -0.16122923188
campusclose -0.347578517598 -0.29137624733
pets 1.000000000000 0.37620154105
new 0.376201541048 1.00000000000

We see that none of the variables (rooms, baths, house and pets) have correlation above 0.80.

a.

I first fit the full model and noticed the p-value for campusclose was very high. I dropped it from the model and then ran the regression, and noticed the p-value for sqrfoot was very high. I dropped it from the model and then ran the regression, and noticed the p-value for new was very high. I dropped it from the model and then ran the regression, and found that all remaining variables have p-value less than 0.05. Now I check the correlation between all the remaining variables to see if there is any relationship that was too strong between the explanatory variables. I found none of the variables with strong correlation.

b.

R output of the final model is,

> summary(model4)

Call:
lm(formula = rent ~ rooms + baths + house + pets, data = rooms)

Residuals:
Min 1Q Median 3Q Max
-432.02712 -121.34304 -5.14514 119.74665 668.79221

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 150.31946 70.17344 2.14211 0.0366277 *
rooms 452.39178 39.39960 11.48214 3.1402e-16 ***
baths 157.54211 45.39362 3.47058 0.0010179 **
house -183.88561 76.90869 -2.39096 0.0202537 *
pets 124.85771 60.79773 2.05366 0.0447754 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 209.1394 on 55 degrees of freedom
Multiple R-squared: 0.8878522,   Adjusted R-squared: 0.879696
F-statistic: 108.856 on 4 and 55 DF, p-value: < 2.2204e-16

c.

The least squares regression equation of model is,

Rent = 150.31946 + 452.39178 rooms + 157.54211 baths - 183.88561 house + 124.85771 pets

d.

From the summary(model1) output, the adjusted R- squared of full model is 0.8782114.

From the summary(model4) output, the  adjusted R- squared of full model is 0.879696.

There is not much difference between the full model and final model.

Both models are adequately fitting the given data.


Related Solutions

Data: Pets owned by men: 1, 2, 3, Pets owner by women: 0, 0, 0, 0,...
Data: Pets owned by men: 1, 2, 3, Pets owner by women: 0, 0, 0, 0, 1, 2, 3, 3 Questions: 1a. How many pets do the men in the class own? Write down each data point. 2. What is the mean number of pets the men in class own? 2a. What is the standard deviation of the number of pets the men in class own? You can use Excel for this computation. 3. How many pets do the women...
Question 1: Using 1 variable regression house price vs. number of rooms predict the price of a house with 4 rooms for your client.
Question 1: Using 1 variable regression house price vs. number of rooms predict the price of a house with 4 rooms for your client.You enhance the model by adding 2 more variables:Question 2: What is the predicted house price for a home based on the new model with 3 variables, given that the house has 5 rooms is 10 years old and is 2,000 Square feet?Question 3: Which coefficients are statistically significant at alpha (α) 5% in the model with...
1. Rooms in a house (Bedroom, Bathroom, Living Room, etc.) are an example of a variable...
1. Rooms in a house (Bedroom, Bathroom, Living Room, etc.) are an example of a variable that follows which scale of measurement?             a. ratio scale             b. interval scale             c. nominal scale             d. ordinal scale 2. The top 10 ranked jobs based on various criterion are listed below. Here we are interested in looking at the stress rating of each job (I picked the right one in terms of stress!...also note how many jobs that are ranked...
Days 1 to 120: Number of New Cases 0 1 0 0 2 0 3 0...
Days 1 to 120: Number of New Cases 0 1 0 0 2 0 3 0 0 0 0 14 2 1 0 27 80 51 18 26 216 81 37 117 167 108 151 178 111 414 337 195 706 214 300 68 118 160 190 209 154 193 170 154 166 68 119 152 146 101 98 159 175 195 195 157 130 129 161 116 133 217 209 75 298 240 205 196 233 76 339 170...
1. In 1975 the price of a new house was $50,000. In 2015 the price of...
1. In 1975 the price of a new house was $50,000. In 2015 the price of a new house is $300,000. How much has the price of housing increased over the entire 40 years in percentage terms? A.5.00% B.600% C.4.58% D.500% 2. The current cost of tuition is $12,000 per year. The cost of tuition is rising at 5.00% per year. At this rate of inflation, how much will the cost of tuition be in 20 years? A.$31,840 B.$12,348 C.$31,203...
In Java: int[] A = new int[2]; A[0] = 0; A[1] = 2; f(A[0],A[A[0]]); void f(int...
In Java: int[] A = new int[2]; A[0] = 0; A[1] = 2; f(A[0],A[A[0]]); void f(int x, int y) { x = 1; y = 3; } For each of the following parameter-passing methods, saw what the final values in the array A would be, after the call to f. (There may be more than one correct answer.) a. By value. b. By reference. c. By value-result.
On 1/1/20x1 you have borrowed $450,000 from a mortgage bank to buy a new house, and...
On 1/1/20x1 you have borrowed $450,000 from a mortgage bank to buy a new house, and wish to repay the mortgage loan and the interest in 5 equal annual payments, the first one being payable on 12/31/20x1. The mortgage loan bears interest at 7%.         a) Calculate the annual mortgage payment required. Round off to the nearest cent (e.g.,    $112,753.32). b) Construct the mortgage payment schedule to see if the loan and interest will be paid    in full...
(1) Emily Morrison purchased a new house for ​$100,000. She paid ​$40,000 upfront and agreed to...
(1) Emily Morrison purchased a new house for ​$100,000. She paid ​$40,000 upfront and agreed to pay the rest over the next 20 years in 20 equal annual payments that include principal payments plus 9 percent compound interest on the unpaid balance. What will these equal payments​ be? a.  Emily Morrison purchased a new house for ​$100,000 and paid ​$40,000 upfront. How much does she need to borrow to purchase the​ house​(Round to the nearest​ dollar.) (2) To pay for...
Let Z ∼ Normal(0 , 1) and Y ∼ χ 2 γ , then the new...
Let Z ∼ Normal(0 , 1) and Y ∼ χ 2 γ , then the new r.v. T = √ Z Y /γ has the Student’s t-distribution. The density function of T is fT (t) = Γ[(γ + 1)/2] √γπΓ(γ/2) 1 + t 2 γ !−(γ+1)/2 . (a) (3 points) Describe the similarity/difference between T and Z. (b) (6 points) Let t0 be a particular value of t. Use t-distribution table to find t0 values such that the following statements...
Q1: Carrefour is expecting its new center to generate the following cash flows: Years 0 1...
Q1: Carrefour is expecting its new center to generate the following cash flows: Years 0 1 2 3 4 5 Initial Investment ($35,000,000) Net operating cash-flow $6,000,000 $8,000,000 $16,000,000 $20,000,000 $30,000,000 a. Determine the payback for this new center. (1 mark) b. Determine the net present value using a cost of capital of 15 percent. Should the project be accepted?
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT