In: Statistics and Probability
For complete credit show all of your calculations, list your assumptions and formulas, draw relevant tables and plots all on handwritten pages in your notebook.
If there are repetitious calculations you only need to show an illustrative calculation.
Submit pictures of your handwritten pages no later than Wednesday, March 4th by 1800 hours ending.
Use this sample of house prices and lotsizes in the Pelham Bay neighborhood of the Bronx from 2018-2019 to answer the questions below.
price | lotsize |
490000 | 2503 |
512000 | 2483 |
345000 | 2500 |
508670 | 2900 |
550000 | 2513 |
300000 | 2513 |
995000 | 4950 |
920000 | 3135 |
470000 | 2375 |
450000 | 2375 |
1)
Both the variables are continuous
2)
I am using lotsize
Steps for Histogram
Lower Limit |
Upper Limit |
Mid point |
Interval |
Freq |
Relative Freq |
Cumulative Freq |
2000 |
2999 |
2499.5 |
2000-2999 |
8 |
0.8 |
8 |
3000 |
3999 |
3499.5 |
3000-3999 |
1 |
0.1 |
9 |
4000 |
4999 |
4499.5 |
4000-4999 |
1 |
0.1 |
10 |
Steps to create a Ogive (Cumulative frequency Graph)
3)
From the data, we calculate the mean and standard deviation of the two groups.
Mean (X bar) = Sum of Values /n
and
price |
lotsize |
|
Maximum |
995000 |
4950 |
Minimum |
300000 |
2375 |
First Quartile (Q1) |
455000 |
2487.25 |
Median |
499335 |
2508 |
Third Quartile (Q3) |
540500 |
2803.25 |
IQR (Q3-Q1) |
85500 |
316 |
Range (Max-Min) |
695000 |
2575 |
Mean |
554067 |
2824.7 |
Std Dev |
226675.7 |
784.8078 |
4)
As evident from the above boxplots, both price and lotsize have outliers.
5)
Weighted Average is same the mean as calculated in part 3
6)
lotsize (x) |
price (y) |
xy |
x2 |
y2 |
|
2,503 |
4,90,000 |
12264,70,000 |
62,65,009 |
2401000,00,000 |
|
2,483 |
5,12,000 |
12712,96,000 |
61,65,289 |
2621440,00,000 |
|
2,500 |
3,45,000 |
8625,00,000 |
62,50,000 |
1190250,00,000 |
|
2,900 |
5,08,670 |
14751,43,000 |
84,10,000 |
2587451,68,900 |
|
2,513 |
5,50,000 |
13821,50,000 |
63,15,169 |
3025000,00,000 |
|
2,513 |
3,00,000 |
7539,00,000 |
63,15,169 |
900000,00,000 |
|
4,950 |
9,95,000 |
49252,50,000 |
245,02,500 |
9900250,00,000 |
|
3,135 |
9,20,000 |
28842,00,000 |
98,28,225 |
8464000,00,000 |
|
2,375 |
4,70,000 |
11162,50,000 |
56,40,625 |
2209000,00,000 |
|
2,375 |
4,50,000 |
10687,50,000 |
56,40,625 |
2025000,00,000 |
|
Total |
28,247 |
55,40,670 |
169659,09,000 |
853,32,611 |
35323391,68,900 |
Using the above formula and values, we get
Correlation coefficient (r) = 0.821
The magnitude tells us that there is strong correlation between the two variables and the sign of r tells us that there is direct relationship between the two ie as lotsize increases, price also increases and vice versa.
7)
Let the regression equation be: Y = a + bX
Where
Slope(b) = {n*∑XY - ∑X *∑Y}/{n*∑X2 – (∑X)2 } = 237.26
and a = ∑Y/n – b*∑X/n = -1,16,107.41
Hence, Price = 237.26 * Lotsize - 1,16,107.41