In: Statistics and Probability
The questions involve the data set for asking prices of Richmond
townhouses obtained on 2014.11.03.
For your subset, the response variable is:
asking price divided by 10000:
askpr=c(55.8, 48.5, 57.8, 108.8, 50.5, 53.9, 33.7, 53.8, 78.8,
51.99, 47.9, 45.99, 77.8, 52.4, 58.68, 47.8, 59.8, 48.8, 57.8,
40.9, 68.5, 79.8, 68.8, 65.99, 40.8, 73.9, 54.98, 51.68, 73.8,
81.9, 86.8, 41.99, 50.8, 25.9, 58.8, 26.99, 62.9, 55.2, 54.8,
71.99, 56.8, 79.99, 74.8, 61.5, 68.8, 50.8, 53.8, 57.5, 44.8,
65.8)
The explanatory variables are:
(i) finished floor area divided by 100
ffarea=c(13.06, 14.8, 12.01, 23.98, 12.26, 11.84, 12, 12.22, 19.48,
12.09, 12.1, 16.01, 16.5, 16.22, 13.96, 13.34, 17.63, 14.8, 13.84,
16.06, 13.59, 15.25, 15.95, 22.78, 14, 15.15, 13.06, 15.1, 17.54,
20.95, 15.08, 12.9, 12.27, 6.1, 17.37, 10.5, 14, 15.3, 11.26,
15.05, 15.5, 22, 17.48, 14.5, 16.9, 16.6, 10.95, 13.46, 9.4,
13.45)
(ii) age
age=c(0, 24, 0, 16, 3, 15, 28, 9, 11, 7, 7, 25, 3, 25, 9, 32, 26,
50, 10, 25, 2, 3, 18, 35, 38, 0, 1, 20, 9, 19, 1, 44, 17, 11, 26,
37, 5, 9, 0, 8, 23, 20, 5, 7, 8, 23, 18, 10, 14, 1)
(iii) monthly maintenance fee divided by 10
mfee=c(18.6, 16.1, 14.2, 36.9, 18, 21, 25.9, 18.5, 20.4, 18.1, 18,
33.7, 25.4, 36.4, 22, 24.5, 32, 25, 16, 24.4, 17, 35, 23.6, 57.4,
23, 22.2, 19.6, 24.5, 18.2, 34.8, 48.8, 23.2, 25.2, 17.1, 31, 28,
19.6, 16.9, 24.8, 22.3, 17.4, 26.7, 29.7, 18.7, 19.4, 19.9, 24.7,
22.1, 23.3, 18.2)
(iv) number of bedrooms
beds=c(3, 3, 3, 3, 3, 2, 2, 3, 3, 3, 3, 3, 4, 3, 3, 3, 5, 3, 3, 2,
3, 2, 3, 2, 3, 4, 3, 3, 4, 1, 3, 3, 2, 1, 3, 2, 3, 3, 2, 3, 3, 3,
4, 3, 4, 4, 2, 3, 2, 3)
You are to make a prediction of the response variable when
ffarea=18, age=10, mfee=30, beds=4.
You are to fit three multiple regression models with the response
variable askpr:
(i) 2 explanatory variables ffarea, age
(ii) 3 explanatory variables ffarea, age, mfee
(iii) 4 explanatory variables ffarea, age, mfee, beds
After you have copied the above R vectors into your R session, you
can get a dataframe with
richmondtownh=data.frame(cbind(askpr,ffarea,age,mfee,beds))
Please use 3 decimal places for the answers below which are not
integer-valued
Part a)
The values of adjusted R2R2 for the above models with 2, 3 and 4
explanatory variables are respectively:
2 explanatory:
3 explanatory:
4 explanatory:
Part b)
For the best of these 3 models based on adjusted R2R2, the number
of explanatory variables is:
Part c)
For the best of these 3 models based on adjusted R2R2, the least
squares coefficient for ffarea is
and a 95% confidence interval for βffareaβffarea is
to
Part d)
For the best of these 3 models based on adjusted R2R2, get the
prediction, SE and 95% prediction interval when the future values
of the explanatory variables are: ffarea=18, age=10, mfee=30,
beds=4.
prediction: and its SE ,
and the upper endpoint of the 95% prediction interval is
> askpr=c(55.8, 48.5, 57.8, 108.8, 50.5, 53.9, 33.7, 53.8,
78.8, 51.99, 47.9, 45.99,
+ 77.8, 52.4, 58.68, 47.8, 59.8, 48.8, 57.8, 40.9, 68.5, 79.8,
68.8, 65.99, 40.8,
+ 73.9, 54.98, 51.68, 73.8, 81.9, 86.8, 41.99, 50.8, 25.9, 58.8,
26.99, 62.9,
+ 55.2, 54.8, 71.99, 56.8, 79.99, 74.8, 61.5, 68.8, 50.8, 53.8,
57.5, 44.8, 65.8)
> length(askpr)
[1] 50
>
> #The explanatory variables are:
> #(i) finished floor area divided by 100
> ffarea=c(13.06, 14.8, 12.01, 23.98, 12.26, 11.84, 12, 12.22,
19.48, 12.09, 12.1, 16.01, 16.5, 16.22, 13.96, 13.34, 17.63, 14.8,
13.84, 16.06, 13.59, 15.25, 15.95, 22.78, 14, 15.15, 13.06, 15.1,
17.54, 20.95, 15.08, 12.9, 12.27, 6.1, 17.37, 10.5, 14, 15.3,
11.26, 15.05, 15.5, 22, 17.48, 14.5, 16.9, 16.6, 10.95, 13.46, 9.4,
13.45)
> length(ffarea)
[1] 50
>
> #(ii) age
> age=c(0, 24, 0, 16, 3, 15, 28, 9, 11, 7, 7, 25, 3, 25, 9, 32,
26, 50, 10, 25, 2, 3, 18, 35, 38, 0, 1, 20, 9, 19, 1, 44, 17, 11,
26, 37, 5, 9, 0, 8, 23, 20, 5, 7, 8, 23, 18, 10, 14, 1)
>
> #(iii) monthly maintenance fee divided by 10
> mfee=c(18.6, 16.1, 14.2, 36.9, 18, 21, 25.9, 18.5, 20.4, 18.1,
18, 33.7, 25.4, 36.4, 22, 24.5, 32, 25, 16, 24.4, 17, 35, 23.6,
57.4, 23, 22.2, 19.6, 24.5, 18.2, 34.8, 48.8, 23.2, 25.2, 17.1, 31,
28, 19.6, 16.9, 24.8, 22.3, 17.4, 26.7, 29.7, 18.7, 19.4, 19.9,
24.7, 22.1, 23.3, 18.2)
>
> #(iv) number of bedrooms
> beds=c(3, 3, 3, 3, 3, 2, 2, 3, 3, 3, 3, 3, 4, 3, 3, 3, 5, 3,
3, 2, 3, 2, 3, 2, 3, 4, 3, 3, 4, 1, 3, 3, 2, 1, 3, 2, 3, 3, 2, 3,
3, 3, 4, 3, 4, 4, 2, 3, 2, 3)
>
>
>
richmondtownh=data.frame(cbind(askpr,ffarea,age,mfee,beds))
> dim(richmondtownh)
[1] 50 5
> head(richmondtownh)
askpr ffarea age mfee beds
1 55.8 13.06 0 18.6 3
2 48.5 14.80 24 16.1 3
3 57.8 12.01 0 14.2 3
4 108.8 23.98 16 36.9 3
5 50.5 12.26 3 18.0 3
6 53.9 11.84 15 21.0 2
>
> ###########Multiple regression Model############
>
> #i) 2 explanatory variables ffarea, age
> Model_1=lm(askpr~ffarea+age,richmondtownh)
> summary(Model_1)
Call:
lm(formula = askpr ~ ffarea + age, data = richmondtownh)
Residuals:
Min 1Q Median 3Q Max
-16.188 -4.230 -1.044 3.985 17.174
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.76197 4.65225 2.958 0.00483 **
ffarea 3.74931 0.31014 12.089 4.98e-16 ***
age -0.67552 0.08246 -8.192 1.32e-10 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 7.085 on 47 degrees of freedom
Multiple R-squared: 0.7983, Adjusted R-squared: 0.7897
F-statistic: 93.01 on 2 and 47 DF, p-value: < 2.2e-16
>
> #ii)3 explanatory variables ffarea, age, mfee
> Model_2=lm(askpr~ffarea+age+mfee,richmondtownh)
> summary(Model_2)
Call:
lm(formula = askpr ~ ffarea + age + mfee, data = richmondtownh)
Residuals:
Min 1Q Median 3Q Max
-15.512 -3.891 0.029 3.849 15.644
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.78694 4.62801 2.763 0.00821 **
ffarea 3.47668 0.35284 9.853 6.50e-13 ***
age -0.70910 0.08412 -8.430 6.93e-11 ***
mfee 0.22612 0.14621 1.547 0.12883
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 6.983 on 46 degrees of freedom
Multiple R-squared: 0.8083, Adjusted R-squared: 0.7958
F-statistic: 64.64 on 3 and 46 DF, p-value: < 2.2e-16
>
> #iii) 4 explanatory variables ffarea, age, mfee, beds
> Model_3=lm(askpr~ffarea+age+mfee+beds,richmondtownh)
> summary(Model_3)
Call:
lm(formula = askpr ~ ffarea + age + mfee + beds, data =
richmondtownh)
Residuals:
Min 1Q Median 3Q Max
-15.1009 -3.8995 -0.0028 3.5089 15.8676
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.91489 5.68421 2.096 0.0417 *
ffarea 3.42554 0.40372 8.485 6.86e-11 ***
age -0.70664 0.08547 -8.268 1.41e-10 ***
mfee 0.24173 0.15864 1.524 0.1346
beds 0.41984 1.55640 0.270 0.7886
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 7.054 on 45 degrees of freedom
Multiple R-squared: 0.8086, Adjusted R-squared: 0.7916
F-statistic: 47.52 on 4 and 45 DF, p-value: 1.343e-15
>
> ####part a) ###
> #Adjusted R-squared
>
> #2 explanatory variables= 0.790
> #3 explanatory variables=0.796
> #4 explanatory variables= 0.792
>
> ### part b) ###
> #For the best of these 3 models based on adjusted R2R2,
> #the number of explanatory variables is: 3
> #because 3 explanatory variables gives maximum Adjusted
R-squared.
>
> ###Part c)
> #The best Model is Model_2 : 3 explanatory variables
> #the least squares coefficient for ffarea : 3.47668
> # 95% confidence interval for ?ffarea?ffarea
> confint(Model_2,'ffarea',level=0.95)
2.5 % 97.5 %
ffarea 2.766445 4.186911
>
> ###Part d)
> #explanatory variables are:
> ffarea=18; age=10; mfee=30; beds=4.
> test_data=data.frame(ffarea,age,mfee,beds)
> test_data
ffarea age mfee beds
1 18 10 30 4
> #regression model:Model_2
> predicted_askpr=predict(Model_2,test_data)
> predicted_askpr
1
75.0597
>
>
confidence_askpr=predict(Model_2,test_data,interval='confidence')
> confidence_askpr
fit lwr upr
1 75.0597 71.93949 78.1799
>
>
prediction_askpr=predict(Model_2,test_data,interval='prediction')
> prediction_askpr
fit lwr upr
1 75.0597 60.66212 89.45727
>
> n=length(askpr)
> n
[1] 50
>
SE=(1+(1/n)+(predicted_askpr-mean(askpr))^2/(sum((askpr-mean(askpr))^2)))
> SE
1
1.042469