Question

In: Statistics and Probability

The questions involve the data set for asking prices of Richmond townhouses obtained on 2014.11.03. For...

The questions involve the data set for asking prices of Richmond townhouses obtained on 2014.11.03.
For your subset, the response variable is:
asking price divided by 10000:
askpr=c(65.8, 41.99, 54.8, 44.8, 50.8, 50.5, 54.98, 81.9, 48.5, 51.99, 26.99, 108.8, 57.8, 79.99, 33.7, 55.8, 40.8, 56.88, 46.8, 79.8, 53.8, 45.99, 40.9, 62.9, 48.8, 65.99, 58.39, 57.8, 50.8, 78.8, 68.8, 86.8, 54.8, 68.5, 58.68, 52.4, 51.68, 68.5, 59.8, 57.5, 68.8, 58.8, 53.9, 61.5, 47.9, 47.8, 77.8, 25.9, 60.8, 74.8)
The explanatory variables are:
(i) finished floor area divided by 100
ffarea=c(13.45, 12.9, 11.26, 9.4, 12.27, 12.26, 13.06, 20.95, 14.8, 12.09, 10.5, 23.98, 12.01, 22, 12, 13.06, 12.26, 15.78, 16.2, 15.25, 10.95, 16.01, 16.06, 14, 14.8, 22.78, 15.09, 13.84, 16.6, 19.48, 15.95, 15.08, 15.46, 13.59, 13.96, 16.22, 15.1, 15.76, 17.63, 13.46, 16.9, 17.37, 11.84, 14.5, 12.1, 13.34, 16.5, 6.1, 13.2, 17.48)
(ii) age
age=c(1, 44, 0, 14, 17, 3, 1, 19, 24, 7, 37, 16, 0, 20, 28, 0, 29, 17, 30, 3, 18, 25, 25, 5, 50, 35, 8, 10, 23, 11, 18, 1, 41, 2, 9, 25, 20, 4, 26, 10, 8, 26, 15, 7, 7, 32, 3, 11, 3, 5)
(iii) monthly maintenance fee divided by 10
mfee=c(18.2, 23.2, 24.8, 23.3, 25.2, 18, 19.6, 34.8, 16.1, 18.1, 28, 36.9, 14.2, 26.7, 25.9, 18.6, 19.8, 17.3, 16, 35, 24.7, 33.7, 24.4, 19.6, 25, 57.4, 20.3, 16, 19.9, 20.4, 23.6, 48.8, 31, 17, 22, 36.4, 24.5, 22.1, 32, 22.1, 19.4, 31, 21, 18.7, 18, 24.5, 25.4, 17.1, 18.9, 29.7)
(iv) number of bedrooms
beds=c(3, 3, 2, 2, 2, 3, 3, 1, 3, 3, 2, 3, 3, 3, 2, 3, 3, 4, 4, 2, 2, 3, 2, 3, 3, 2, 4, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 4, 5, 3, 4, 3, 2, 3, 3, 3, 4, 1, 3, 4)
You are to make a prediction of the response variable when ffarea=18, age=11, mfee=27, beds=3.

You are to fit three multiple regression models with the response variable askpr:
(i) 2 explanatory variables ffarea, age
(ii) 3 explanatory variables ffarea, age, mfee
(iii) 4 explanatory variables ffarea, age, mfee, beds
After you have copied the above R vectors into your R session, you can get a dataframe with
richmondtownh=data.frame(cbind(askpr,ffarea,age,mfee,beds))

Please use 3 decimal places for the answers below which are not integer-valued
Part a)
The values of adjusted ?2R2 for the above models with 2, 3 and 4 explanatory variables are respectively:
2 explanatory:
3 explanatory:
4 explanatory:


Part b)
For the best of these 3 models based on adjusted ?2R2, the number of explanatory variables is:


Part c)
For the best of these 3 models based on adjusted ?2R2, the least squares coefficient for ffarea is

and a 95% confidence interval for ???????βffarea is
to

Part d)
For the best of these 3 models based on adjusted ?2R2, get the prediction, SE and 95% prediction interval when the future values of the explanatory variables are: ffarea=18, age=11, mfee=27, beds=3.
prediction:  and its SE  ,
and the upper endpoint of the 95% prediction interval is

Solutions

Expert Solution

Sol:

Entire rcode:

askpr=c(65.8, 41.99, 54.8, 44.8, 50.8, 50.5, 54.98, 81.9, 48.5, 51.99, 26.99, 108.8, 57.8, 79.99, 33.7, 55.8, 40.8, 56.88, 46.8, 79.8, 53.8, 45.99, 40.9, 62.9, 48.8, 65.99, 58.39, 57.8, 50.8, 78.8, 68.8, 86.8, 54.8, 68.5, 58.68, 52.4, 51.68, 68.5, 59.8, 57.5, 68.8, 58.8, 53.9, 61.5, 47.9, 47.8, 77.8, 25.9, 60.8, 74.8)
ffarea=c(13.45, 12.9, 11.26, 9.4, 12.27, 12.26, 13.06, 20.95, 14.8, 12.09, 10.5, 23.98, 12.01, 22, 12, 13.06, 12.26, 15.78, 16.2, 15.25, 10.95, 16.01, 16.06, 14, 14.8, 22.78, 15.09, 13.84, 16.6, 19.48, 15.95, 15.08, 15.46, 13.59, 13.96, 16.22, 15.1, 15.76, 17.63, 13.46, 16.9, 17.37, 11.84, 14.5, 12.1, 13.34, 16.5, 6.1, 13.2, 17.48)
age=c(1, 44, 0, 14, 17, 3, 1, 19, 24, 7, 37, 16, 0, 20, 28, 0, 29, 17, 30, 3, 18, 25, 25, 5, 50, 35, 8, 10, 23, 11, 18, 1, 41, 2, 9, 25, 20, 4, 26, 10, 8, 26, 15, 7, 7, 32, 3, 11, 3, 5)
mfee=c(18.2, 23.2, 24.8, 23.3, 25.2, 18, 19.6, 34.8, 16.1, 18.1, 28, 36.9, 14.2, 26.7, 25.9, 18.6, 19.8, 17.3, 16, 35, 24.7, 33.7, 24.4, 19.6, 25, 57.4, 20.3, 16, 19.9, 20.4, 23.6, 48.8, 31, 17, 22, 36.4, 24.5, 22.1, 32, 22.1, 19.4, 31, 21, 18.7, 18, 24.5, 25.4, 17.1, 18.9, 29.7)
beds=c(3, 3, 2, 2, 2, 3, 3, 1, 3, 3, 2, 3, 3, 3, 2, 3, 3, 4, 4, 2, 2, 3, 2, 3, 3, 2, 4, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 4, 5, 3, 4, 3, 2, 3, 3, 3, 4, 1, 3, 4)
richmondtownh=data.frame(cbind(askpr,ffarea,age,mfee,beds))
richmondtownh
library(leaps)
library(car)
bestsubsets = regsubsets(askpr ~. , data = richmondtownh, nbest = 1)
subsets(bestsubsets, statistic = "adjr2")
colnames(richmondtownh)
subsets(bestsubsets, statistic = "adjr2")
model1 <- lm(askpr~ffarea+age,data=richmondtownh)
mode12 <- lm(askpr~ffarea+age+mfee,data=richmondtownh)
model3 <- lm(askpr~ffarea+age+mfee+beds,data=richmondtownh)
summary(model1)
summary(mode12)
summary(model3)

round(summary(model1)$adj.r.squared,3)
round(summary(mode12)$adj.r.squared,3)
round(summary(model3)$adj.r.squared,3)

#model 2 is best as it ha minimum predictors amd max adj r sq

#No of explanatrory variables=3

coefficients(mode12)
confint(mode12)
attach(richmondtownh)
newdata=data.frame(ffarea=18,age=11,mfee=27)
predict(mode12,newdata,interval = "predict")

Rscreenshot

ANSWERS;

Part a)

2 explanatory: 0.776
3 explanatory:0.788
4 explanatory: 0.783

Part b)
For the best of these 3 models based on adjusted ?2R2, the number of explanatory variables is:

adj Rsq=0.788 he number of explanatory variables is 3

Part c)

we have from model2

asking price= 12.4793027 +3.3574769*ffarea   -0.6753720 *age +0.2788832 *mfee

he least squares coefficient for ffarea=3.3574769

95% confidence interval for ???????βffarea is  2.641 to   4.074

partd)

prediction: 73.01464

Upper 95% prediction interval is

87.39833


Related Solutions

The questions involve the data set for asking prices of Richmond townhouses obtained on 2014.11.03. For...
The questions involve the data set for asking prices of Richmond townhouses obtained on 2014.11.03. For your subset, the response variable is: asking price divided by 10000: askpr=c(55.8, 48.5, 57.8, 108.8, 50.5, 53.9, 33.7, 53.8, 78.8, 51.99, 47.9, 45.99, 77.8, 52.4, 58.68, 47.8, 59.8, 48.8, 57.8, 40.9, 68.5, 79.8, 68.8, 65.99, 40.8, 73.9, 54.98, 51.68, 73.8, 81.9, 86.8, 41.99, 50.8, 25.9, 58.8, 26.99, 62.9, 55.2, 54.8, 71.99, 56.8, 79.99, 74.8, 61.5, 68.8, 50.8, 53.8, 57.5, 44.8, 65.8) The explanatory variables...
These questions do not have data. These are conceptual, it is just asking on a basis...
These questions do not have data. These are conceptual, it is just asking on a basis of how would you do this in most generic forms of these questions. A) How would you solve for the “PMT” in an annuity and perpetuity problem? B) How would you determine which of the 3 “PMTs” you need to solve for? C) What would the time subscripts in the PVA and PVP formulas tell us? D) How would you solve for targeted savings...
The following set of data was obtained by the method of initial rates for the reaction:...
The following set of data was obtained by the method of initial rates for the reaction: 2NO(g) + O2 (g) ----> 2NO2(g) Experiment # [NO], M [O2], M Initial Rate, M/s 1 0.0126 0.0125 1.41 x 10^-2 2 0.0252 0.0250 1.13 x 10^-1 3 0.0252 0.0125 5.64 x 10^-2 a) What is the rate law for the reaction? b) If you triple the concentration of both reactants, what will happen to the rate? c) What is the value of the...
What are the advantages of asking open-ended questions? Are there any advantages to asking close-ended questions?...
What are the advantages of asking open-ended questions? Are there any advantages to asking close-ended questions? What about disadvantages for each question type? Why would we use surveys to collect self-report or victimization data? What makes them the best choice for this type of data collection? Why should we be aware of bias in questionnaire items? What can we do to reduce bias in our questions?
Consider one of the subset regression models for each data set obtained in Problem Set 4...
Consider one of the subset regression models for each data set obtained in Problem Set 4 and answer the following questions. (i) Draw the scatter plot matrix, residual vs. predictor variable plots and added variable plots. Comment on the regression model based on these plots. (ii) Draw the normal-probability plot and comment. (iii) Draw the correlogram and comment. (iv) Detect leverage points from the data. (v) Compute Cook’s distance statistics and detect all outlier points from the data. (vi) Compute...
Consider one of the subset regression models for each data set obtained in Problem Set 4...
Consider one of the subset regression models for each data set obtained in Problem Set 4 and answer the following questions. (i) Draw the scatter plot matrix, residual vs. predictor variable plots and added variable plots. Comment on the regression model based on these plots. (ii) Draw the normal-probability plot and comment. (iii) Draw the correlogram and comment. (iv) Detect leverage points from the data. (v) Compute Cook’s distance statistics and detect all outlier points from the data. (vi) Compute...
The following is the data obtained from a set of samples on the relation between statistics...
The following is the data obtained from a set of samples on the relation between statistics final exam scores and the students’ confidence rating on mathematical skills. Student ID Exam score Math confidence Z exam Z math conf A 60 1 A -.77 -1.61 B 80 5 B .77 1.36 C 70 3 C 0 -0.12 D 50 2 D -1.55 -0.87 E 90 4 E 1.55 0.62 F 70 4 F 0 0.62 Mean 70 3.17 SD 12.9 1.34...
The following is the data obtained from a set of samples on the relation between statistics...
The following is the data obtained from a set of samples on the relation between statistics final exam scores and the number of missed classes. Student ID Exam score Class missed Z exam Z class missed A 90 0 A 1.46 -1.46 B 80 1 B 0.88 -0.88 C 70 3 C 0.29 0.29 D 60 2 D -0.29 -0.29 E 50 4 E -0.87 0.88 F 40 5 F -1.46 1.46 Mean 65 2.5 SD 17.08 1.71 9. a....
A set of reliability testing data for a special equipment was obtained and the ordered ages...
A set of reliability testing data for a special equipment was obtained and the ordered ages at failure (hours) were: 8.3, 13, 16.9, 20.1, 23.8, 26.5, 29.8, 33.2, 36.5, 41, 45.1, 51.7, 61.3. Assume that these times to failure are normally distributed. Estimate the equipment reliability and hazard function at age 25 hours.
For a data set obtained from a sample, n = 79 and x = 45.30 ....
For a data set obtained from a sample, n = 79 and x = 45.30 . It is known that o = 4.1. a. What is the point estimate of u? The point estimate is ????????. b. Make a 90% confidence interval for u . Round your answers to two decimal places. (???????, ??????) c. What is the margin of error of estimate for part b? Round your answer to three decimal places. E =???????
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT