In: Statistics and Probability
Question 3: Refer to dataset ‘Rails.csv’ on Canvas. Consider a
model with adj2007 (estimated 2007 price in thousands of 2014
dollars) as a response variable and distance (distance to the
closest bike trail in km) and squarefeet (square footage of
interior finished space in thousands of square feet) as explanatory
variables.
(a) (10 points) Assess the significance of this model using R. Show
and label all steps and show your work. Don’t forget to verify the
conditions in the proper step.
(b) (2 points) Suppose that the model is significant. What is the
expected estimated 2007 price in thousands of 2014 dollars when the
distance to the closest trail is half a kilometer and the interior
finished space is 2 thousand square feet? Show work in R.
(c) (5 points) Suppose the model is significant and that the
required conditions for regression are satisfied. Obtain a 90%
confidence interval for the mean response when distance = 0.5 and
squarefeet = 2. Also calculate ??! , the standard error of the
point estimate ?. No need to show the four-step process for the
interval but do show all your work in R.
(d) (5 points) Suppose the model is significant and that the
required conditions for regression are satisfied. Obtain a 99%
prediction interval for a new response value when distance = 1 and
squarefeet = 1. Also calculate ??! the standard error of the point
estimate ?. No need to show the four-step process for the interval
but do show all your work in R.
This is the file: https://ufile.io/nsi19
ANSWER
rails=read.csv("C:/Users/Asus/Desktop/rails.csv",header=T)
attach(rails)
rails.lm<-lm(adj2007~distance+squarefeet,data=rails);rails.lm #fitting a regression model using the given variables
a)
summary(rails.lm) # this gives the summary of the model and also tests for the hypothesis
#from the summary, we can see that p values of both distance and squarefeet are less than 0.05 and hence the model is significant
b)
# now we know that the model is significant and we go for further analysis.
newdata=data.frame(distance=0.5,squarefeet=2)
predict(rails.lm,newdata) # predicts the value of adj2007 using the
fitted regression model
c)
predict(rails.lm,newdata,interval="predict",level=0.90) #90%
prediction interval for the mean response when distance =0.5 and
squarefeet=2
std.error=(403.0597-317.1182)/qt(0.1/2,104-2);std.error # derived
from the formula of prediction interval for mean response
#alternate method
std.error=(489.0011-403.0597)/qt(0.1/2,104-2);std.error
d)
newdata=data.frame(distance=1,squarefee=1)
predict(rails.lm,newdata,interval="predict",level=0.99) #99%
prediction interval of the new response variable
std.error2=(394.8167-258.9515)/qt(0.01/2,104-2);std.error
OUTPUT:
>
rails=read.csv("C:/Users/Asus/Desktop/rails.csv",header=T)
> attach(rails)
The following objects are masked _by_ .GlobalEnv:
adj2007, distance, squarefeet
>
>
rails.lm<-lm(adj2007~distance+squarefeet,data=rails);rails.lm
#fitting a regression model using the given variables
Call:
lm(formula = adj2007 ~ distance + squarefeet, data = rails)
Coefficients:
(Intercept) distance squarefeet
109.74 -16.49 150.78
>
>
> summary(rails.lm) # this gives the summary of the model and
also tests for the hypothesis
Call:
lm(formula = adj2007 ~ distance + squarefeet, data = rails)
Residuals:
Min 1Q Median 3Q Max
-138.835 -32.621 -1.903 27.369 145.504
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 109.742 20.057 5.472 3.25e-07 ***
distance -16.486 5.942 -2.775 0.00659 **
squarefeet 150.780 9.998 15.080 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 51.34 on 101 degrees of freedom
Multiple R-squared: 0.7655, Adjusted R-squared: 0.7608
F-statistic: 164.8 on 2 and 101 DF, p-value: < 2.2e-16
>
> #from the summary, we can see that p values of both distance
and squarefeet are less than 0.05 and hence the model is
significant
>
>
> # now we know that the model is significant and we go for
further analysis.
>
> newdata=data.frame(distance=0.5,squarefeet=2)
> predict(rails.lm,newdata) # predicts the value of adj2007
using the fitted regression model
1
403.0597
>
>
> predict(rails.lm,newdata,interval="predict",level=0.90) #90%
prediction interval for the mean response when distance =0.5 and
squarefeet=2
fit lwr upr
1 403.0597 317.1182 489.0011
> std.error=(403.0597-317.1182)/qt(0.1/2,104-2);std.error #
derived from the formula of prediction interval for mean
response
[1] -51.77417
>
> #alternate method
> std.error=(489.0011-403.0597)/qt(0.1/2,104-2);std.error
[1] -51.77411
>
> newdata=data.frame(distance=1,squarefee=1)
> predict(rails.lm,newdata,interval="predict",level=0.99) #99%
prediction interval of the new response variable
fit lwr upr
1 394.8167 258.9515 530.6819
>
std.error2=(394.8167-258.9515)/qt(0.01/2,104-2);std.error
[1] -51.77411
>