In: Statistics and Probability
Question 3: Refer to dataset ‘Rails.csv’ on Canvas. Consider a model with adj2007 (estimated 2007 price in thousands of 2014 dollars) as a response variable and distance (distance to the closest bike trail in km) and squarefeet (square footage of interior finished space in thousands of square feet) as explanatory variables. (a) (10 points) Assess the significance of this model using R. Show and label all steps and show your work. Don’t forget to verify the conditions in the proper step. (b) (2 points) Suppose that the model is significant. What is the expected estimated 2007 price in thousands of 2014 dollars when the distance to the closest trail is half a kilometer and the interior finished space is 2 thousand square feet? Show work in R. (c) (5 points) Suppose the model is significant and that the required conditions for regression are satisfied. Obtain a 90% confidence interval for the mean response when distance = 0.5 and squarefeet = 2. Also calculate ��! , the standard error of the point estimate �. No need to show the four-step process for the interval but do show all your work in R. (d) (5 points) Suppose the model is significant and that the required conditions for regression are satisfied. Obtain a 99% prediction interval for a new response value when distance = 1 and squarefeet = 1. Also calculate ��! the standard error of the point estimate �. No need to show the four-step process for the interval but do show all your work in R. this is the file: https://ufile.io/fce2t
The following text is in R script. The output is not typed here. Following this i have pasted the output.
rails=read.csv("C:/Users/Asus/Desktop/rails.csv",header=T)
attach(rails)
rails.lm<-lm(adj2007~distance+squarefeet,data=rails);rails.lm #fitting a regression model using the given variables
a)
summary(rails.lm) # this gives the summary of the model and also tests for the hypothesis
#from the summary, we can see that p values of both distance and squarefeet are less than 0.05 and hence the model is significant
b)
# now we know that the model is significant and we go for further analysis.
newdata=data.frame(distance=0.5,squarefeet=2)
predict(rails.lm,newdata) # predicts the value of adj2007 using the
fitted regression model
c)
predict(rails.lm,newdata,interval="predict",level=0.90) #90%
prediction interval for the mean response when distance =0.5 and
squarefeet=2
std.error=(403.0597-317.1182)/qt(0.1/2,104-2);std.error # derived
from the formula of prediction interval for mean response
#alternate method
std.error=(489.0011-403.0597)/qt(0.1/2,104-2);std.error
d)
newdata=data.frame(distance=1,squarefee=1)
predict(rails.lm,newdata,interval="predict",level=0.99) #99%
prediction interval of the new response variable
std.error2=(394.8167-258.9515)/qt(0.01/2,104-2);std.error
OUTPUT:
>
rails=read.csv("C:/Users/Asus/Desktop/rails.csv",header=T)
> attach(rails)
The following objects are masked _by_ .GlobalEnv:
adj2007, distance, squarefeet
>
>
rails.lm<-lm(adj2007~distance+squarefeet,data=rails);rails.lm
#fitting a regression model using the given variables
Call:
lm(formula = adj2007 ~ distance + squarefeet, data = rails)
Coefficients:
(Intercept) distance squarefeet
109.74 -16.49 150.78
>
>
> summary(rails.lm) # this gives the summary of the model and
also tests for the hypothesis
Call:
lm(formula = adj2007 ~ distance + squarefeet, data = rails)
Residuals:
Min 1Q Median 3Q Max
-138.835 -32.621 -1.903 27.369 145.504
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 109.742 20.057 5.472 3.25e-07 ***
distance -16.486 5.942 -2.775 0.00659 **
squarefeet 150.780 9.998 15.080 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 51.34 on 101 degrees of freedom
Multiple R-squared: 0.7655, Adjusted R-squared: 0.7608
F-statistic: 164.8 on 2 and 101 DF, p-value: < 2.2e-16
>
> #from the summary, we can see that p values of both distance
and squarefeet are less than 0.05 and hence the model is
significant
>
>
> # now we know that the model is significant and we go for
further analysis.
>
> newdata=data.frame(distance=0.5,squarefeet=2)
> predict(rails.lm,newdata) # predicts the value of adj2007
using the fitted regression model
1
403.0597
>
>
> predict(rails.lm,newdata,interval="predict",level=0.90) #90%
prediction interval for the mean response when distance =0.5 and
squarefeet=2
fit lwr upr
1 403.0597 317.1182 489.0011
> std.error=(403.0597-317.1182)/qt(0.1/2,104-2);std.error #
derived from the formula of prediction interval for mean
response
[1] -51.77417
>
> #alternate method
> std.error=(489.0011-403.0597)/qt(0.1/2,104-2);std.error
[1] -51.77411
>
> newdata=data.frame(distance=1,squarefee=1)
> predict(rails.lm,newdata,interval="predict",level=0.99) #99%
prediction interval of the new response variable
fit lwr upr
1 394.8167 258.9515 530.6819
>
std.error2=(394.8167-258.9515)/qt(0.01/2,104-2);std.error
[1] -51.77411
>