In: Statistics and Probability
1a. Pick either your Redfin or Zillow data, and put it in this table.
Home# | Estimate | Sale price |
1 | 367,846 | 335,000 |
2 | 314,085 | 295,000 |
3 | 327,343 | 350,000 |
4 | 270,548 | 260,000 |
5 | 153,270 | 142,500 |
6 | 133,920 | 106,500 |
7 | 230,040 | 225,000 |
8 | 1,429,459 | 1.35M |
9 | 135,240 | 134,000 |
10 | 210,515 | 212,000 |
11 | 344,369 | 342,000 |
12 | 279,008 | 270,000 |
13 | 406,836 | 395,000 |
14 | 301,213 | 277,500 |
15 | 156,575 | 190,000 |
b. We’re going to test whether the slope of the
regression line is zero or non-zero. Write hypotheses with the
appropriate symbols, and then write that hypothesis in words,
specific to this situation.
c. Perform the calculations using whatever tool you
like best. Excel is convenient, or you can use a calculator, or a
website like this. Please report the F-statistic and the
p-value.
d. Using your p-value, make a conclusion regarding your
hypotheses from part b. Be sure to make your conclusion about this
specific situation.
e. Your p-value is probably very close to 0 so you
rejected the null hypothesis. If you failed to reject H0, why would
that be bad news for the website you picked?
2. Look at this cool scatterplot I made:
a. Describe in 1 – 2 sentences the overall trend of the dots in this plot.
b. The correlation coefficient is -0.407. Adding a trendline to the plot gives us this:
Does it appear that there is a negative trend to these points,
or does it look like they are just randomly-distributed dots?
c. The regression output is:
Coefficients | Standard Error | t Stat | P- Value | |
Intercept | 16.69423038 | 0.848559849 | 19.67360393 | 1.28 |
X | -0.156150685 | 0.082709556 | -1.887940057 | 0.07 |
Based on the p-value for x, would you say there is a statistically significant downward trend in these dots? Explain your answer in a complete sentence or two.
d. Would you be surprised if I told you that these numbers are randomly generated independently, and have nothing to do with each other? Why or why not?
3. Based on everything you’ve learned in this course,
separating randomness from significant effects is (select one) Easy
/ Difficult.
(1)
(a) Regression output using Excel:
Regression Analysis | |||||||
r² | 0.997 | n | 15 | ||||
r | 0.998 | k | 1 | ||||
Std. Error | 17554.724 | Dep. Var. | Sale price | ||||
ANOVA table | |||||||
Source | SS | df | MS | F | p-value | ||
Regression | 1,221,456,545,052.0100 | 1 | 1,221,456,545,052.0100 | 3963.60 | 1.52E-17 | ||
Residual | 4,006,188,281.3206 | 13 | 308,168,329.3324 | ||||
Total | 1,225,462,733,333.3300 | 14 | |||||
Regression output | confidence interval | ||||||
variables | coefficients | std. error | t (df=13) | p-value | 95% lower | 95% upper | std. coeff. |
Intercept | 8,793.5376 | 0.000 | |||||
Estimate | 0.9392 | 0.0149 | 62.957 | 1.52E-17 | 0.9070 | 0.9714 | 0.998 |
(b) Ho: β = 0 and Ha: β ≠ 0
Ho: The slope coefficient for Estimate is 0 and Ha: The slope coefficient for estimate is not 0
(c) F statistic = 3963.60, p- value = 0
(d) Since the p- value < 0.05, we reject Ho, and conclude that the slope coefficient for Estimate is different from 0
(e) Not quite sure what they are asking here. If you clarify, I may be able to help!