In: Statistics and Probability
The questions involve the data set for asking prices of Richmond
townhouses obtained on 2014.11.03.
For your subset, the response variable is:
asking price divided by 10000:
askpr=c(65.8, 41.99, 54.8, 44.8, 50.8, 50.5, 54.98, 81.9, 48.5,
51.99, 26.99, 108.8, 57.8, 79.99, 33.7, 55.8, 40.8, 56.88, 46.8,
79.8, 53.8, 45.99, 40.9, 62.9, 48.8, 65.99, 58.39, 57.8, 50.8,
78.8, 68.8, 86.8, 54.8, 68.5, 58.68, 52.4, 51.68, 68.5, 59.8, 57.5,
68.8, 58.8, 53.9, 61.5, 47.9, 47.8, 77.8, 25.9, 60.8, 74.8)
The explanatory variables are:
(i) finished floor area divided by 100
ffarea=c(13.45, 12.9, 11.26, 9.4, 12.27, 12.26, 13.06, 20.95, 14.8,
12.09, 10.5, 23.98, 12.01, 22, 12, 13.06, 12.26, 15.78, 16.2,
15.25, 10.95, 16.01, 16.06, 14, 14.8, 22.78, 15.09, 13.84, 16.6,
19.48, 15.95, 15.08, 15.46, 13.59, 13.96, 16.22, 15.1, 15.76,
17.63, 13.46, 16.9, 17.37, 11.84, 14.5, 12.1, 13.34, 16.5, 6.1,
13.2, 17.48)
(ii) age
age=c(1, 44, 0, 14, 17, 3, 1, 19, 24, 7, 37, 16, 0, 20, 28, 0, 29,
17, 30, 3, 18, 25, 25, 5, 50, 35, 8, 10, 23, 11, 18, 1, 41, 2, 9,
25, 20, 4, 26, 10, 8, 26, 15, 7, 7, 32, 3, 11, 3, 5)
(iii) monthly maintenance fee divided by 10
mfee=c(18.2, 23.2, 24.8, 23.3, 25.2, 18, 19.6, 34.8, 16.1, 18.1,
28, 36.9, 14.2, 26.7, 25.9, 18.6, 19.8, 17.3, 16, 35, 24.7, 33.7,
24.4, 19.6, 25, 57.4, 20.3, 16, 19.9, 20.4, 23.6, 48.8, 31, 17, 22,
36.4, 24.5, 22.1, 32, 22.1, 19.4, 31, 21, 18.7, 18, 24.5, 25.4,
17.1, 18.9, 29.7)
(iv) number of bedrooms
beds=c(3, 3, 2, 2, 2, 3, 3, 1, 3, 3, 2, 3, 3, 3, 2, 3, 3, 4, 4, 2,
2, 3, 2, 3, 3, 2, 4, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 4, 5, 3, 4, 3,
2, 3, 3, 3, 4, 1, 3, 4)
You are to make a prediction of the response variable when
ffarea=18, age=11, mfee=27, beds=3.
You are to fit three multiple regression models with the response
variable askpr:
(i) 2 explanatory variables ffarea, age
(ii) 3 explanatory variables ffarea, age, mfee
(iii) 4 explanatory variables ffarea, age, mfee, beds
After you have copied the above R vectors into your R session, you
can get a dataframe with
richmondtownh=data.frame(cbind(askpr,ffarea,age,mfee,beds))
Please use 3 decimal places for the answers below which are not
integer-valued
Part a)
The values of adjusted ?2R2 for the above models with 2, 3 and 4
explanatory variables are respectively:
2 explanatory:
3 explanatory:
4 explanatory:
Part b)
For the best of these 3 models based on adjusted ?2R2, the number
of explanatory variables is:
Part c)
For the best of these 3 models based on adjusted ?2R2, the least
squares coefficient for ffarea is
and a 95% confidence interval for ???????βffarea is
to
Part d)
For the best of these 3 models based on adjusted ?2R2, get the
prediction, SE and 95% prediction interval when the future values
of the explanatory variables are: ffarea=18, age=11, mfee=27,
beds=3.
prediction: and its SE ,
and the upper endpoint of the 95% prediction interval is
Sol:
Entire rcode:
askpr=c(65.8, 41.99, 54.8, 44.8, 50.8, 50.5, 54.98, 81.9, 48.5,
51.99, 26.99, 108.8, 57.8, 79.99, 33.7, 55.8, 40.8, 56.88, 46.8,
79.8, 53.8, 45.99, 40.9, 62.9, 48.8, 65.99, 58.39, 57.8, 50.8,
78.8, 68.8, 86.8, 54.8, 68.5, 58.68, 52.4, 51.68, 68.5, 59.8, 57.5,
68.8, 58.8, 53.9, 61.5, 47.9, 47.8, 77.8, 25.9, 60.8, 74.8)
ffarea=c(13.45, 12.9, 11.26, 9.4, 12.27, 12.26, 13.06, 20.95, 14.8,
12.09, 10.5, 23.98, 12.01, 22, 12, 13.06, 12.26, 15.78, 16.2,
15.25, 10.95, 16.01, 16.06, 14, 14.8, 22.78, 15.09, 13.84, 16.6,
19.48, 15.95, 15.08, 15.46, 13.59, 13.96, 16.22, 15.1, 15.76,
17.63, 13.46, 16.9, 17.37, 11.84, 14.5, 12.1, 13.34, 16.5, 6.1,
13.2, 17.48)
age=c(1, 44, 0, 14, 17, 3, 1, 19, 24, 7, 37, 16, 0, 20, 28, 0, 29,
17, 30, 3, 18, 25, 25, 5, 50, 35, 8, 10, 23, 11, 18, 1, 41, 2, 9,
25, 20, 4, 26, 10, 8, 26, 15, 7, 7, 32, 3, 11, 3, 5)
mfee=c(18.2, 23.2, 24.8, 23.3, 25.2, 18, 19.6, 34.8, 16.1, 18.1,
28, 36.9, 14.2, 26.7, 25.9, 18.6, 19.8, 17.3, 16, 35, 24.7, 33.7,
24.4, 19.6, 25, 57.4, 20.3, 16, 19.9, 20.4, 23.6, 48.8, 31, 17, 22,
36.4, 24.5, 22.1, 32, 22.1, 19.4, 31, 21, 18.7, 18, 24.5, 25.4,
17.1, 18.9, 29.7)
beds=c(3, 3, 2, 2, 2, 3, 3, 1, 3, 3, 2, 3, 3, 3, 2, 3, 3, 4, 4, 2,
2, 3, 2, 3, 3, 2, 4, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 4, 5, 3, 4, 3,
2, 3, 3, 3, 4, 1, 3, 4)
richmondtownh=data.frame(cbind(askpr,ffarea,age,mfee,beds))
richmondtownh
library(leaps)
library(car)
bestsubsets = regsubsets(askpr ~. , data = richmondtownh, nbest =
1)
subsets(bestsubsets, statistic = "adjr2")
colnames(richmondtownh)
subsets(bestsubsets, statistic = "adjr2")
model1 <- lm(askpr~ffarea+age,data=richmondtownh)
mode12 <- lm(askpr~ffarea+age+mfee,data=richmondtownh)
model3 <-
lm(askpr~ffarea+age+mfee+beds,data=richmondtownh)
summary(model1)
summary(mode12)
summary(model3)
round(summary(model1)$adj.r.squared,3)
round(summary(mode12)$adj.r.squared,3)
round(summary(model3)$adj.r.squared,3)
#model 2 is best as it ha minimum predictors amd max adj r sq
#No of explanatrory variables=3
coefficients(mode12)
confint(mode12)
attach(richmondtownh)
newdata=data.frame(ffarea=18,age=11,mfee=27)
predict(mode12,newdata,interval = "predict")
Rscreenshot
ANSWERS;
Part a)
2 explanatory: 0.776
3 explanatory:0.788
4 explanatory: 0.783
Part b)
For the best of these 3 models based on adjusted ?2R2, the number
of explanatory variables is:
adj Rsq=0.788 he number of explanatory variables is 3
Part c)
we have from model2
asking price= 12.4793027 +3.3574769*ffarea -0.6753720 *age +0.2788832 *mfee
he least squares coefficient for ffarea=3.3574769
95% confidence interval for ???????βffarea is 2.641 to 4.074
partd)
prediction: 73.01464
Upper 95% prediction interval is
87.39833