In: Statistics and Probability
If anyone can give a detailed, step by step analysis of the following problem I need to be able to do a problem LIKE this for a test Dec. 5, 2019. Thanks!
Example Problem – 40 points. Use the printout to answer the
following questions.
1. Suppose we want to develop a model to predict assessed
value based on heating area. A sample of 15 single-family
houses is selected in a particular community. The assessed
value (in thousands of $ and the heating area of the houses
(in thousands of square feet) is recorded with the following
results:
House
Assessed
Heating Area
Value ($000)
of Dwelling
(000 of Sq.Ft.)
1
84.4
2.00
2
77.4
1.71
3
75.7
1.45
4
85.9
1.76
5
79.1
1.93
6
70.4
1.20
7
75.8
1.55
8
85.9
1.93
9
78.5
1.59
10
79.2
1.50
11
86.7
1.90
12
79.3
1.39
13
74.5
1.54
14
83.8
1.89
15
76.8
1.59
Using the printout provided, evaluate the quality of the
regression equation by answering the questions below.
(3 points)
a.
State the regression equation.
(4 points)
b.
Interpret the meaning of the
b
0
and b
1
values in this
problem.
(3 points)
c.
Predict the average assessed value for a house that has
2,000 square feet.
(4 points)
d. Is there anything useful in this equation? Examine the F
value in answering this question?
(4 points)
e. Provide the coefficient of determination, using the
adjusted value and interpret its meaning in this problem.
(3 points)
f. Provide the correlation coefficient. Does it indicate a
significant relationship between assessed value and square
footage? How do you know?
(4 points)
g. Should the regression model be used to predict assessed
value or not? How do you know?
(3 points)
h. Examine the residuals to see whether they are normally
distributed.
(4 points)
i. At the .05 level of significance, determine whether the
explanatory variable makes a significant contribution to the
regression equation. Use the t statistic to answer this
question. Explain the meaning of the p-value for the
coefficient.
(3 points)
j. Set up a 95% confidence interval estimate of the true
population slope. Interpret the meaning of this interval.
(5 points)
k. Summarize what you have decided regarding the overall
usefulness of the model for prediction based on the analysis
you have done.
The regression analysis is done in R.
R Code:
R Output:
a.
The linear regression model equation is,
b.
Intercept: The expected value of the assessed value is 51.9153 when the heating area of the dwelling is zero.
Slope: For a $1000 increase in the heating area, the assessed value is increased by 16.633.
c.
From the R output,
Prediction = 85.18208
Using the regression equation,
d.
Based on the regression output summary,
Significance level = 0.05
F | P-value | |
Regression | 25.16 | 0.0002362 |
The P-value for the F statistic is = 0.0002362 which is less than 0.05 at a 5% significance level which means the model fits the data value at the predefined significance level = 0.05. Hence we can conclude that the independent variable fits the model significantly compare to the model when no independent variable considered.
e.
From, the result summary,
R Square | 0.6593 |
The R-square value tells, how well the regression model fits the data values. The R-square value of the model is 0.6593 which means, the model explains approximately 65.93% of the variance of the data value.
f.
Correlation coefficient | 0.811996 |
There is a strong positive correlation between the assessed value and square footage.
g.
Since the model is significant at a 5% significant level (part d) and the R square value is good. Based on this evidence we can say that the model is a good fit. Hence the model is a good fit to prediction.
h.
From the normal QQ plot,
Since all the data points fall along the reference line, we can conclude that there is no violation of normality.
i.
From the regression output summary,
the t statistic and the p-value for the slope estimate are,
t Stat | P-value | |
Heating Area of Dwelling | 5.016 | 0.0002 |
The P-values for the independent variable is less than 0.05 at a 5% significance level hence we can conclude that this independent variable is significant in the model.
j.
From the regression output,
The 95% confidence interval for the slope coefficient is,
Lower 95.0% | Upper 95.0% | |
Heating Area of Dwelling | 9.469541 | 23.7972 |
There is a 95% chance that the true population estimate of the slope coefficient will lie in this range
k.
Since the regression model is significantly fit the data values and follows all the necessary assumptions, the model is good for prediction