Question

In: Statistics and Probability

The questions involve the data set for asking prices of Richmond townhouses obtained on 2014.11.03. For...

The questions involve the data set for asking prices of Richmond townhouses obtained on 2014.11.03.
For your subset, the response variable is:
asking price divided by 10000:
askpr=c(55.8, 48.5, 57.8, 108.8, 50.5, 53.9, 33.7, 53.8, 78.8, 51.99, 47.9, 45.99, 77.8, 52.4, 58.68, 47.8, 59.8, 48.8, 57.8, 40.9, 68.5, 79.8, 68.8, 65.99, 40.8, 73.9, 54.98, 51.68, 73.8, 81.9, 86.8, 41.99, 50.8, 25.9, 58.8, 26.99, 62.9, 55.2, 54.8, 71.99, 56.8, 79.99, 74.8, 61.5, 68.8, 50.8, 53.8, 57.5, 44.8, 65.8)
The explanatory variables are:
(i) finished floor area divided by 100
ffarea=c(13.06, 14.8, 12.01, 23.98, 12.26, 11.84, 12, 12.22, 19.48, 12.09, 12.1, 16.01, 16.5, 16.22, 13.96, 13.34, 17.63, 14.8, 13.84, 16.06, 13.59, 15.25, 15.95, 22.78, 14, 15.15, 13.06, 15.1, 17.54, 20.95, 15.08, 12.9, 12.27, 6.1, 17.37, 10.5, 14, 15.3, 11.26, 15.05, 15.5, 22, 17.48, 14.5, 16.9, 16.6, 10.95, 13.46, 9.4, 13.45)
(ii) age
age=c(0, 24, 0, 16, 3, 15, 28, 9, 11, 7, 7, 25, 3, 25, 9, 32, 26, 50, 10, 25, 2, 3, 18, 35, 38, 0, 1, 20, 9, 19, 1, 44, 17, 11, 26, 37, 5, 9, 0, 8, 23, 20, 5, 7, 8, 23, 18, 10, 14, 1)
(iii) monthly maintenance fee divided by 10
mfee=c(18.6, 16.1, 14.2, 36.9, 18, 21, 25.9, 18.5, 20.4, 18.1, 18, 33.7, 25.4, 36.4, 22, 24.5, 32, 25, 16, 24.4, 17, 35, 23.6, 57.4, 23, 22.2, 19.6, 24.5, 18.2, 34.8, 48.8, 23.2, 25.2, 17.1, 31, 28, 19.6, 16.9, 24.8, 22.3, 17.4, 26.7, 29.7, 18.7, 19.4, 19.9, 24.7, 22.1, 23.3, 18.2)
(iv) number of bedrooms
beds=c(3, 3, 3, 3, 3, 2, 2, 3, 3, 3, 3, 3, 4, 3, 3, 3, 5, 3, 3, 2, 3, 2, 3, 2, 3, 4, 3, 3, 4, 1, 3, 3, 2, 1, 3, 2, 3, 3, 2, 3, 3, 3, 4, 3, 4, 4, 2, 3, 2, 3)
You are to make a prediction of the response variable when ffarea=18, age=10, mfee=30, beds=4.

You are to fit three multiple regression models with the response variable askpr:
(i) 2 explanatory variables ffarea, age
(ii) 3 explanatory variables ffarea, age, mfee
(iii) 4 explanatory variables ffarea, age, mfee, beds
After you have copied the above R vectors into your R session, you can get a dataframe with
richmondtownh=data.frame(cbind(askpr,ffarea,age,mfee,beds))

Please use 3 decimal places for the answers below which are not integer-valued
Part a)
The values of adjusted R2R2 for the above models with 2, 3 and 4 explanatory variables are respectively:
2 explanatory:
3 explanatory:
4 explanatory:


Part b)
For the best of these 3 models based on adjusted R2R2, the number of explanatory variables is:


Part c)
For the best of these 3 models based on adjusted R2R2, the least squares coefficient for ffarea is

and a 95% confidence interval for βffareaβffarea is
to

Part d)
For the best of these 3 models based on adjusted R2R2, get the prediction, SE and 95% prediction interval when the future values of the explanatory variables are: ffarea=18, age=10, mfee=30, beds=4.
prediction:  and its SE  ,
and the upper endpoint of the 95% prediction interval is

Solutions

Expert Solution

> askpr=c(55.8, 48.5, 57.8, 108.8, 50.5, 53.9, 33.7, 53.8, 78.8, 51.99, 47.9, 45.99,
+ 77.8, 52.4, 58.68, 47.8, 59.8, 48.8, 57.8, 40.9, 68.5, 79.8, 68.8, 65.99, 40.8,
+ 73.9, 54.98, 51.68, 73.8, 81.9, 86.8, 41.99, 50.8, 25.9, 58.8, 26.99, 62.9,
+ 55.2, 54.8, 71.99, 56.8, 79.99, 74.8, 61.5, 68.8, 50.8, 53.8, 57.5, 44.8, 65.8)
> length(askpr)
[1] 50
>
> #The explanatory variables are:
> #(i) finished floor area divided by 100
> ffarea=c(13.06, 14.8, 12.01, 23.98, 12.26, 11.84, 12, 12.22, 19.48, 12.09, 12.1, 16.01, 16.5, 16.22, 13.96, 13.34, 17.63, 14.8, 13.84, 16.06, 13.59, 15.25, 15.95, 22.78, 14, 15.15, 13.06, 15.1, 17.54, 20.95, 15.08, 12.9, 12.27, 6.1, 17.37, 10.5, 14, 15.3, 11.26, 15.05, 15.5, 22, 17.48, 14.5, 16.9, 16.6, 10.95, 13.46, 9.4, 13.45)
> length(ffarea)
[1] 50
>
> #(ii) age
> age=c(0, 24, 0, 16, 3, 15, 28, 9, 11, 7, 7, 25, 3, 25, 9, 32, 26, 50, 10, 25, 2, 3, 18, 35, 38, 0, 1, 20, 9, 19, 1, 44, 17, 11, 26, 37, 5, 9, 0, 8, 23, 20, 5, 7, 8, 23, 18, 10, 14, 1)
>
> #(iii) monthly maintenance fee divided by 10
> mfee=c(18.6, 16.1, 14.2, 36.9, 18, 21, 25.9, 18.5, 20.4, 18.1, 18, 33.7, 25.4, 36.4, 22, 24.5, 32, 25, 16, 24.4, 17, 35, 23.6, 57.4, 23, 22.2, 19.6, 24.5, 18.2, 34.8, 48.8, 23.2, 25.2, 17.1, 31, 28, 19.6, 16.9, 24.8, 22.3, 17.4, 26.7, 29.7, 18.7, 19.4, 19.9, 24.7, 22.1, 23.3, 18.2)
>
> #(iv) number of bedrooms
> beds=c(3, 3, 3, 3, 3, 2, 2, 3, 3, 3, 3, 3, 4, 3, 3, 3, 5, 3, 3, 2, 3, 2, 3, 2, 3, 4, 3, 3, 4, 1, 3, 3, 2, 1, 3, 2, 3, 3, 2, 3, 3, 3, 4, 3, 4, 4, 2, 3, 2, 3)
>
>
> richmondtownh=data.frame(cbind(askpr,ffarea,age,mfee,beds))
> dim(richmondtownh)
[1] 50 5
> head(richmondtownh)
askpr ffarea age mfee beds
1 55.8 13.06 0 18.6 3
2 48.5 14.80 24 16.1 3
3 57.8 12.01 0 14.2 3
4 108.8 23.98 16 36.9 3
5 50.5 12.26 3 18.0 3
6 53.9 11.84 15 21.0 2
>
> ###########Multiple regression Model############
>
> #i) 2 explanatory variables ffarea, age
> Model_1=lm(askpr~ffarea+age,richmondtownh)
> summary(Model_1)

Call:
lm(formula = askpr ~ ffarea + age, data = richmondtownh)

Residuals:
Min 1Q Median 3Q Max
-16.188 -4.230 -1.044 3.985 17.174

Coefficients:
Estimate Std. Error t value Pr(>|t|)   
(Intercept) 13.76197 4.65225 2.958 0.00483 **
ffarea 3.74931 0.31014 12.089 4.98e-16 ***
age -0.67552 0.08246 -8.192 1.32e-10 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 7.085 on 47 degrees of freedom
Multiple R-squared: 0.7983, Adjusted R-squared: 0.7897
F-statistic: 93.01 on 2 and 47 DF, p-value: < 2.2e-16

>
> #ii)3 explanatory variables ffarea, age, mfee
> Model_2=lm(askpr~ffarea+age+mfee,richmondtownh)
> summary(Model_2)

Call:
lm(formula = askpr ~ ffarea + age + mfee, data = richmondtownh)

Residuals:
Min 1Q Median 3Q Max
-15.512 -3.891 0.029 3.849 15.644

Coefficients:
Estimate Std. Error t value Pr(>|t|)   
(Intercept) 12.78694 4.62801 2.763 0.00821 **
ffarea 3.47668 0.35284 9.853 6.50e-13 ***
age -0.70910 0.08412 -8.430 6.93e-11 ***
mfee 0.22612 0.14621 1.547 0.12883   
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 6.983 on 46 degrees of freedom
Multiple R-squared: 0.8083, Adjusted R-squared: 0.7958
F-statistic: 64.64 on 3 and 46 DF, p-value: < 2.2e-16

>
> #iii) 4 explanatory variables ffarea, age, mfee, beds
> Model_3=lm(askpr~ffarea+age+mfee+beds,richmondtownh)
> summary(Model_3)

Call:
lm(formula = askpr ~ ffarea + age + mfee + beds, data = richmondtownh)

Residuals:
Min 1Q Median 3Q Max
-15.1009 -3.8995 -0.0028 3.5089 15.8676

Coefficients:
Estimate Std. Error t value Pr(>|t|)   
(Intercept) 11.91489 5.68421 2.096 0.0417 *  
ffarea 3.42554 0.40372 8.485 6.86e-11 ***
age -0.70664 0.08547 -8.268 1.41e-10 ***
mfee 0.24173 0.15864 1.524 0.1346   
beds 0.41984 1.55640 0.270 0.7886   
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 7.054 on 45 degrees of freedom
Multiple R-squared: 0.8086, Adjusted R-squared: 0.7916
F-statistic: 47.52 on 4 and 45 DF, p-value: 1.343e-15

>
> ####part a) ###
> #Adjusted R-squared
>
> #2 explanatory variables= 0.790
> #3 explanatory variables=0.796
> #4 explanatory variables= 0.792
>
> ### part b) ###
> #For the best of these 3 models based on adjusted R2R2,
> #the number of explanatory variables is: 3
> #because 3 explanatory variables gives maximum Adjusted R-squared.
>
> ###Part c)
> #The best Model is Model_2 : 3 explanatory variables
> #the least squares coefficient for ffarea : 3.47668
> # 95% confidence interval for ?ffarea?ffarea
> confint(Model_2,'ffarea',level=0.95)
2.5 % 97.5 %
ffarea 2.766445 4.186911
>
> ###Part d)
> #explanatory variables are:
> ffarea=18; age=10; mfee=30; beds=4.
> test_data=data.frame(ffarea,age,mfee,beds)
> test_data
ffarea age mfee beds
1 18 10 30 4
> #regression model:Model_2
> predicted_askpr=predict(Model_2,test_data)
> predicted_askpr
1
75.0597
>
> confidence_askpr=predict(Model_2,test_data,interval='confidence')
> confidence_askpr
fit lwr upr
1 75.0597 71.93949 78.1799
>
> prediction_askpr=predict(Model_2,test_data,interval='prediction')
> prediction_askpr
fit lwr upr
1 75.0597 60.66212 89.45727
>
> n=length(askpr)
> n
[1] 50
> SE=(1+(1/n)+(predicted_askpr-mean(askpr))^2/(sum((askpr-mean(askpr))^2)))
> SE
1
1.042469


Related Solutions

The questions involve the data set for asking prices of Richmond townhouses obtained on 2014.11.03. For...
The questions involve the data set for asking prices of Richmond townhouses obtained on 2014.11.03. For your subset, the response variable is: asking price divided by 10000: askpr=c(65.8, 41.99, 54.8, 44.8, 50.8, 50.5, 54.98, 81.9, 48.5, 51.99, 26.99, 108.8, 57.8, 79.99, 33.7, 55.8, 40.8, 56.88, 46.8, 79.8, 53.8, 45.99, 40.9, 62.9, 48.8, 65.99, 58.39, 57.8, 50.8, 78.8, 68.8, 86.8, 54.8, 68.5, 58.68, 52.4, 51.68, 68.5, 59.8, 57.5, 68.8, 58.8, 53.9, 61.5, 47.9, 47.8, 77.8, 25.9, 60.8, 74.8) The explanatory variables...
These questions do not have data. These are conceptual, it is just asking on a basis...
These questions do not have data. These are conceptual, it is just asking on a basis of how would you do this in most generic forms of these questions. A) How would you solve for the “PMT” in an annuity and perpetuity problem? B) How would you determine which of the 3 “PMTs” you need to solve for? C) What would the time subscripts in the PVA and PVP formulas tell us? D) How would you solve for targeted savings...
The following set of data was obtained by the method of initial rates for the reaction:...
The following set of data was obtained by the method of initial rates for the reaction: 2NO(g) + O2 (g) ----> 2NO2(g) Experiment # [NO], M [O2], M Initial Rate, M/s 1 0.0126 0.0125 1.41 x 10^-2 2 0.0252 0.0250 1.13 x 10^-1 3 0.0252 0.0125 5.64 x 10^-2 a) What is the rate law for the reaction? b) If you triple the concentration of both reactants, what will happen to the rate? c) What is the value of the...
What are the advantages of asking open-ended questions? Are there any advantages to asking close-ended questions?...
What are the advantages of asking open-ended questions? Are there any advantages to asking close-ended questions? What about disadvantages for each question type? Why would we use surveys to collect self-report or victimization data? What makes them the best choice for this type of data collection? Why should we be aware of bias in questionnaire items? What can we do to reduce bias in our questions?
Consider one of the subset regression models for each data set obtained in Problem Set 4...
Consider one of the subset regression models for each data set obtained in Problem Set 4 and answer the following questions. (i) Draw the scatter plot matrix, residual vs. predictor variable plots and added variable plots. Comment on the regression model based on these plots. (ii) Draw the normal-probability plot and comment. (iii) Draw the correlogram and comment. (iv) Detect leverage points from the data. (v) Compute Cook’s distance statistics and detect all outlier points from the data. (vi) Compute...
Consider one of the subset regression models for each data set obtained in Problem Set 4...
Consider one of the subset regression models for each data set obtained in Problem Set 4 and answer the following questions. (i) Draw the scatter plot matrix, residual vs. predictor variable plots and added variable plots. Comment on the regression model based on these plots. (ii) Draw the normal-probability plot and comment. (iii) Draw the correlogram and comment. (iv) Detect leverage points from the data. (v) Compute Cook’s distance statistics and detect all outlier points from the data. (vi) Compute...
The following is the data obtained from a set of samples on the relation between statistics...
The following is the data obtained from a set of samples on the relation between statistics final exam scores and the students’ confidence rating on mathematical skills. Student ID Exam score Math confidence Z exam Z math conf A 60 1 A -.77 -1.61 B 80 5 B .77 1.36 C 70 3 C 0 -0.12 D 50 2 D -1.55 -0.87 E 90 4 E 1.55 0.62 F 70 4 F 0 0.62 Mean 70 3.17 SD 12.9 1.34...
The following is the data obtained from a set of samples on the relation between statistics...
The following is the data obtained from a set of samples on the relation between statistics final exam scores and the number of missed classes. Student ID Exam score Class missed Z exam Z class missed A 90 0 A 1.46 -1.46 B 80 1 B 0.88 -0.88 C 70 3 C 0.29 0.29 D 60 2 D -0.29 -0.29 E 50 4 E -0.87 0.88 F 40 5 F -1.46 1.46 Mean 65 2.5 SD 17.08 1.71 9. a....
A set of reliability testing data for a special equipment was obtained and the ordered ages...
A set of reliability testing data for a special equipment was obtained and the ordered ages at failure (hours) were: 8.3, 13, 16.9, 20.1, 23.8, 26.5, 29.8, 33.2, 36.5, 41, 45.1, 51.7, 61.3. Assume that these times to failure are normally distributed. Estimate the equipment reliability and hazard function at age 25 hours.
For a data set obtained from a sample, n = 79 and x = 45.30 ....
For a data set obtained from a sample, n = 79 and x = 45.30 . It is known that o = 4.1. a. What is the point estimate of u? The point estimate is ????????. b. Make a 90% confidence interval for u . Round your answers to two decimal places. (???????, ??????) c. What is the margin of error of estimate for part b? Round your answer to three decimal places. E =???????
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT