In: Statistics and Probability
13a) Compute z-scores for the Sale Price variable. Do you note any outliers?
13b) Is there a relationship between Lot Size and the home's Age in years? What test do you perform and why? Now check for whether there is a difference in Lot Size for older versus younger homes (using a cutoff that makes sense). What test do you perform and why?
Home ID | Sale Price | Lot Size | Age | Central Air | Living Area | Full Baths | Half Baths | Bedrooms | Fireplaces |
1 | 320000 | 0.61 | 6 | Yes | 2492 | 2 | 1 | 4 | 2 |
2 | 215000 | 0.63 | 21 | Yes | 1792 | 1 | 1 | 3 | 0 |
3 | 125000 | 0.35 | 22 | No | 1040 | 1 | 0 | 3 | 0 |
4 | 158900 | 0.21 | 3 | No | 1292 | 2 | 0 | 3 | 1 |
5 | 82000 | 0.17 | 21 | No | 1412 | 2 | 1 | 2 | 0 |
6 | 219200 | 1.08 | 30 | No | 1735 | 1 | 1 | 4 | 1 |
7 | 125000 | 0.16 | 133 | No | 852 | 1 | 1 | 2 | 0 |
8 | 110000 | 0.15 | 18 | No | 988 | 1 | 1 | 2 | 0 |
9 | 179000 | 0.6 | 29 | No | 2128 | 1 | 1 | 3 | 1 |
10 | 264900 | 0.55 | 16 | Yes | 1897 | 2 | 1 | 4 | 1 |
11 | 208000 | 0.12 | 3 | No | 1242 | 2 | 0 | 2 | 0 |
12 | 126000 | 0.74 | 47 | No | 1200 | 1 | 0 | 3 | 1 |
13 | 164700 | 0.16 | 25 | Yes | 1602 | 1 | 1 | 2 | 1 |
14 | 339000 | 0.68 | 38 | No | 2132 | 1 | 0 | 3 | 0 |
15 | 150000 | 0.08 | 68 | No | 1392 | 1 | 0 | 3 | 0 |
13a)
In this problem we are given with Sale price for home and we need to compute Zscore for it.
Formula for Z score is given by
First we will compute mean and standard deviation
For Sale price variable Mean
Mean =
=185766.7
For Sale price variable standard deviation =
Standard deviation = 73107.45
So we will compute Z score for sale price
Home ID | Sale Price | Zscore |
1 | 320000 | 1.8361 |
2 | 215000 | 0.3999 |
3 | 125000 | -0.8312 |
4 | 158900 | -0.3675 |
5 | 82000 | -1.4194 |
6 | 219000 | 0.4546 |
7 | 125000 | -0.8312 |
8 | 110000 | -1.0364 |
9 | 179000 | -0.0926 |
10 | 264900 | 1.0824 |
11 | 208000 | 0.3041 |
12 | 126000 | -0.8175 |
13 | 164700 | -0.2882 |
14 | 339000 | 2.0960 |
15 | 150000 | -0.4892 |
We do not observe as such outliers in data set. All observations are some what close to each other.
When Z-score value is within 3 standard deviation value then there is no outlier in the data. If Z score value exceed value of 3 then those values can be treated as outliers.
b) To find relationship between lot's size and Home's Age, we will find correlation between two variables
X = lot size, y = Home's age
So correlation can be computed by using this formula
r = -0.1418
we can observe that there is very poor relationship between two variables.
We will search with this help of scatter plot also
in scatter plot, we can observe that points are scattered randomly. Very weak relationship between lot size and Home's age.
To test relationship between two variables, we can conduct testing of hypothesis for correlation coeffcient. We will conduct this test because it tells about linear relationship between two variables. We can decide that correlation coeffcient is zero ( there is no relationship between variables). or correlation exists between variables ( There is relationship between two variables)
Here we will write Hypothesis as
Null Hypothesis : Correlation = 0 vs
Alternative Hypothesis : Correlation 0
Here we will write test statistic as
test statistic = t =
=
= -0.5168
Here degrees of freedom = n-2 = 15-2 = 13
The critical value at 0.05 level of significance associated with the df = 13 for t is -2.532 or +2.532
Now decision rule is Test statistics > Critical value then reject null hypothesis
Here -0.516 < -2.532 then we fail to reject null hypothesis.
So we get conclusion as correlation as 0.
So we can conclude that may be there is no relationship between lot size and Home's Age.
Further we need to check difference between lot size in youger homes and older homes.
We will create cut off of 25 years.
Before age 25, homes can be treated as younger homes.
After age 25, homes can be treated as older homes.
Hence we will get two groups of lot sizes as
lot size | lot size |
for younger | for older |
homes | homes |
0.12 | 0.16 |
0.21 | 0.6 |
0.61 | 1.08 |
0.55 | 0.68 |
0.15 | 0.74 |
0.17 | 0.08 |
0.63 | 0.16 |
0.35 |
However, looking at the data we observe that there is no difference in lot size of younger homes and lot size of older homes.
To check further here we can perform two sample independent test of hypothesis for difference in sample means. Here we need to check difference in two sample of lot sizes depedent on age. So we can test that difference in two lot sizes is significant or not.
Null hypothesis : There is no difference in mean of lot size for younger homes and mean of lot size of older homes.
Alternative hypothesis : There is difference in mean of lot size for younger homes and mean of lot size of older homes.
Test statistics can be written as
So here wil help of given data
younger | older | |
homes | homes | |
Mean | 0.3487 | 0.5 |
Variance | 0.0472 | 0.1406 |
Observations | 8 | 7 |
Hence Sp^2 = pooled variance = 0.0903
Hence we will compute test statistic , t as
Test statistics = -0.9721
Here degrees of freedom = n1 + n2 -2 = 8+7-2 = 13
At 0.05 level of signficance using t table, df = 13 then critical value is -2.16 or +2.16
decision rule : T statistic > Critical value then reject Null hypothesis
-0.9721 < -2.16 hence we fail to reject null hypothesis.
Hence we accept null hypothesis
We can conclude that may be there is no difference in lot size of younger homes and lot size of older homes.