In: Statistics and Probability
| $3,198,000 |
| $1,450,000 |
| $5,375,000 |
| $3,648,000 |
| $5,195,000 |
| $5,345,000 |
| $5,495,000 |
| $16,300,000 |
| $7,395,000 |
| $9,995,000 |
| $13,500,000 |
| $11,995,000 |
| $5,245,000 |
| $6,350,000 |
| $7,199,000 |
| $5,498,000 |
| $23,000,000 |
| $6,890,000 |
| $37,500,000 |
| $16,950,000 |
| $13,995,000 |
| $7,400,000 |
| $10,900,000 |
| $33,400,000 |
| $18,500,000 |
| $18,995,000 |
| $10,995,000 |
| $11,995,000 |
| $10,995,000 |
| $23,995,000 |
| $2,648,000 |
| $18,900,000 |
| $39,995,000 |
| $9,500,000 |
| $29,000,000 |
| 2,089 sqft |
| 1,961 sqft |
| 4,415 sqft |
| 2,083 sqft |
| 5,352 sqft |
| 3,458 sqft |
| 2,763 sqft |
| 11,438 sqft |
| 4,847 sqft |
| 5,007 sqft |
| 5,639 sqft |
| 6,306 sqft |
| 2,892 sqf |
| 2,626 sqft |
| 4,346 sqft |
| 1,814 sqft |
| 9,000 sqft |
| 6,195 sqft |
| 16,450 sqft |
| 5,269 sqft |
| 5,047 sqft |
| 4,173 sqft |
| 5,134 sqft |
| 17,000 sqft |
| 11,872 sqft |
| 6,424 sqft |
| 7,451 sqft |
| 6,244 sqft |
| 6,495 sqft |
| 9,200 sqft |
| 2,204 sqft |
| 5,880 sqft |
| 15,328 sqft |
| 1,518 sqft |
| 11,491 sqft |
Using the data collected, find the relationship between any two quantitative variables
a) Perform a correlation analysis
b) Perform a regression analysis .Develop a linear regression mathematical model that would help make a prediction based on a given dependent variable.
c) Discuss the findings of your regression analysis. Interpret slope, coefficient of determination in context on problem
d) Provide the scatter plot. (20)
Excel > Data > Data Analysis > Regression
| SUMMARY OUTPUT | ||||||||
| Regression Statistics | ||||||||
| Multiple R | 0.913814181 | |||||||
| R Square | 0.835056357 | |||||||
| Adjusted R Square | 0.830058065 | |||||||
| Standard Error | 4085011.67 | |||||||
| Observations | 35 | |||||||
| ANOVA | ||||||||
| df | SS | MS | F | Significance F | ||||
| Regression | 1 | 2.78792E+15 | 2.78792E+15 | 167.0683352 | 1.82875E-14 | |||
| Residual | 33 | 5.50682E+14 | 1.66873E+13 | |||||
| Total | 34 | 3.3386E+15 | ||||||
| Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |
| Intercept | -620067.5902 | 1266733.768 | -0.489501114 | 0.627725157 | -3197256.819 | 1957121.639 | -3197256.819 | 1957121.639 |
| SQFT(X) | 2189.673105 | 169.4073354 | 12.92549168 | 1.82875E-14 | 1845.01129 | 2534.33492 | 1845.01129 | 2534.33492 |
a)
Correlation coefficient (r) = 0.9138
Hypothesis:
H0:ρ=0
HA:ρ̸=0
Test:
t stat = r*SQRT((n-2)/(1-r^2)) = 0.9138*SQRT((35-2)/(1-0.9138^2)) = 12.924
P value = 0
P value < 0.05, reject H0
There is enough evidence to conclude that there is a significant linear relationship between x and y varaibles
b)
Hypothesis:
H0: β1 = 0
Ha: β1 not = 0
Test:
F stat = 167.07
P value = 0
P value < 0.05, reject H0
Therefore regression model is statistically significant
Regression equation:
Y = -620067.5902+2189.6731*X
c)
Slope interpretation:
If sqft increases by 1 unit, price increases by 2189.6731 units
Coefficient of determination (r^2) = 0.8351
83.51% of variation in price is explained by variation in sqft or regression model
d)
