In: Statistics and Probability
age | educ | male | sleep | totwrk | yngkid | ||||||||||||||
32 | 12 | 1 | 3113 | 3438 | 0 | age: age in year | |||||||||||||
31 | 14 | 1 | 2920 | 5020 | 0 | educ: years of schooling | |||||||||||||
44 | 17 | 1 | 2670 | 2815 | 0 | male: =1 if male | |||||||||||||
30 | 12 | 0 | 3083 | 3786 | 0 | sleep: mins sleep at night, per week | |||||||||||||
64 | 14 | 1 | 3448 | 2580 | 0 | totwrk: mins worked per week | |||||||||||||
41 | 12 | 1 | 4063 | 1205 | 0 | yngkid: =3 if children <3 present | |||||||||||||
35 | 12 | 1 | 3180 | 2113 | 1 | ||||||||||||||
47 | 13 | 1 | 2928 | 3608 | 0 | Consider the following model: | |||||||||||||
32 | 17 | 1 | 3368 | 2353 | 0 | sleep = β0 + β1 totwrk + β2 educ + β3 age + β4 age2 + β5 yngkid + β6 male + u | |||||||||||||
30 | 15 | 1 | 3018 | 2851 | 0 | a. Write down a model that allows the variance of u to differ between men and women. The variance should not depend on other factors. | |||||||||||||
43 | 8 | 1 | 1575 | 6415 | 0 | b. Is the variance of u higher for men or for women? | |||||||||||||
23 | 16 | 0 | 3295 | 370 | 0 | c. Is the variance of u statistically different for men and for women? | |||||||||||||
24 | 16 | 1 | 3798 | 2438 | 0 | ||||||||||||||
48 | 5 | 1 | 3008 | 2693 | 0 | ||||||||||||||
33 | 12 | 1 | 3248 | 2526 | 0 | ||||||||||||||
23 | 12 | 1 | 3683 | 2950 | 0 | ||||||||||||||
46 | 17 | 1 | 3201 | 3003 | 0 | ||||||||||||||
37 | 14 | 1 | 2580 | 4011 | 1 | ||||||||||||||
53 | 12 | 1 | 3420 | 2300 | 0 | ||||||||||||||
45 | 17 | 1 | 3090 | 1543 | 0 | ||||||||||||||
46 | 17 | 1 | 2760 | 3473 | 0 | ||||||||||||||
40 | 13 | 1 | 2880 | 3276 | 0 | ||||||||||||||
53 | 12 | 1 | 3470 | 2506 | 0 | ||||||||||||||
29 | 13 | 1 | 2673 | 2651 | 0 | ||||||||||||||
29 | 12 | 1 | 2820 | 4580 | 0 | ||||||||||||||
53 | 12 | 1 | 2873 | 3588 | 0 | ||||||||||||||
28 | 13 | 1 | 1905 | 3418 | 0 | ||||||||||||||
35 | 12 | 0 | 2926 | 2250 | 0 | ||||||||||||||
36 | 12 | 1 | 2603 | 2638 | 1 |
a). We have been given with the data set and we need to find the variance when it is only dependent on gender.
To check it, what should we do ?
We should develop a model when there is established presence of male. Clearly this occur for all case where male=1.
As per the question the variance of residual depends on gender only
or ,
Mathematically,
This can be interpreted as the variance being equal to for female and for male.
This can be a suitable model dependent only on gender.
b). Now we need to check if the variance for male is higher in comparison to female .
To do so we will first of all run the regression of all 29 data point as given for dependent variable sleep and try to get a generalized regression equation like
sleep = β0 + β1 totwrk + β2 educ + β3 age + β4 age2 + β5 yngkid + β6 male + u
The regression data summary is as obtained
But this is the regression when entire regression is run based all independent variable.
From this regression run ,we have obtained the residuals as follows
RESIDUAL OUTPUT | ||
Observation | Predicted sleep | Residuals |
1 | 2925.93 | 187.07 |
2 | 2484.01 | 435.99 |
3 | 3014.46 | -344.46 |
4 | 2602.65 | 480.35 |
5 | 3512.24 | -64.24 |
6 | 3487.49 | 575.51 |
7 | 3029.86 | 150.14 |
8 | 2824.17 | 103.83 |
9 | 3215.79 | 152.21 |
10 | 3116.64 | -98.64 |
11 | 2019.94 | -444.94 |
12 | 3735.99 | -440.99 |
13 | 3375.70 | 422.30 |
14 | 3126.00 | -118.00 |
15 | 3170.84 | 77.16 |
16 | 3275.34 | 407.66 |
17 | 2972.69 | 228.31 |
18 | 2462.90 | 117.10 |
19 | 3286.43 | 133.57 |
20 | 3382.04 | -292.04 |
21 | 2838.75 | -78.75 |
22 | 2895.02 | -15.02 |
23 | 3227.72 | 242.28 |
24 | 3201.40 | -528.40 |
25 | 2655.53 | 164.47 |
26 | 2919.35 | -46.35 |
27 | 3004.51 | -1099.51 |
28 | 2965.37 | -39.37 |
29 | 2870.25 | -267.25 |
Now that we have got the residual from the regression run, we are now in a position to check its dependency with gender /being male.
So now what should we do ?
We should now plot a new table based on square of residual (why? because square of residual is related to variance) to the male.
The table will be like this
RESIDUAL OUTPUT | ||||
Observation | Predicted sleep | Residuals | Residual^2 | Male |
1 | 2925.93 | 187.07 | 34994.21 | 1 |
2 | 2484.01 | 435.99 | 190087.48 | 1 |
3 | 3014.46 | -344.46 | 118654.14 | 1 |
4 | 2602.65 | 480.35 | 230740.81 | 0 |
5 | 3512.24 | -64.24 | 4127.38 | 1 |
6 | 3487.49 | 575.51 | 331210.75 | 1 |
7 | 3029.86 | 150.14 | 22543.15 | 1 |
8 | 2824.17 | 103.83 | 10780.16 | 1 |
9 | 3215.79 | 152.21 | 23167.60 | 1 |
10 | 3116.64 | -98.64 | 9729.41 | 1 |
11 | 2019.94 | -444.94 | 197971.12 | 1 |
12 | 3735.99 | -440.99 | 194471.88 | 0 |
13 | 3375.70 | 422.30 | 178333.59 | 1 |
14 | 3126.00 | -118.00 | 13924.08 | 1 |
15 | 3170.84 | 77.16 | 5953.67 | 1 |
16 | 3275.34 | 407.66 | 166190.02 | 1 |
17 | 2972.69 | 228.31 | 52124.34 | 1 |
18 | 2462.90 | 117.10 | 13713.40 | 1 |
19 | 3286.43 | 133.57 | 17842.27 | 1 |
20 | 3382.04 | -292.04 | 85287.61 | 1 |
21 | 2838.75 | -78.75 | 6200.94 | 1 |
22 | 2895.02 | -15.02 | 225.70 | 1 |
23 | 3227.72 | 242.28 | 58701.23 | 1 |
24 | 3201.40 | -528.40 | 279210.42 | 1 |
25 | 2655.53 | 164.47 | 27052.00 | 1 |
26 | 2919.35 | -46.35 | 2148.78 | 1 |
27 | 3004.51 | -1099.51 | 1208925.26 | 1 |
28 | 2965.37 | -39.37 | 1549.62 | 0 |
29 | 2870.25 | -267.25 | 71421.51 | 1 |
Now we run the regression of the above to see the kind of relation between the variance and the gender
The summary of regression is as follows
Now look at the coefficient of male . What is it ?
It is negative and a very high value indicating that the variance of error is higher for female than for males.
c). To check for the significance of male on the error term variance, check for the p value of the t statistic of male .
Clearly it is 0.88 for a t stat of -0.15.
Hence p value >0.05
Indicating that the factor of male(or female) is NOT significant .
In other words, we can say that there is statistically no difference in variance between male or female.