In: Statistics and Probability
The data set data_ksubs.csv contains information on net financial wealth (nettf a), age of the survey respondent (age), annual family income (inc), family size (fsize), and participation in certain pension plans for people in the United States. The wealth and income variables are both recorded in thousands of dollars. In particular, the variable e401k is equal to 1 is the person is eligible for 401k, a retirement savings plan sponsored by the employer, and 0 otherwise. a. Create a scatter plot of nettf a against inc. Can you observe any visible correlation between nettf a and inc? Do you think that a regression of nettf a on inc may feature heteroskedasticity? Explain. b. Suppose that the Least-Squares assumptions are satisfied and estimate the following regression model: nettf ai = β0 + β1malei + β2e401ki + β3inci + β4agei + µi , i = 1, ..., n. Report the estimated values of the regression coefficients and discuss their signs (if each is or is not as you expected), as well as their standard errors and significance levels. Also, report the R2 and the value of the F−statistic for the null hypothesis that all the slope coefficients are equal to 0. Do you reject the F-test’s null hypothesis? c. We now introduce some additional variables and some nonlinearities in the model. We add the square of age (agesq), the square of income (incsq), a dummy for the individual being married (marr), and the household size (fsize). We thus estimate the following model. nettf ai = β0 + β1malei + β2e401ki + β3inci + β4agei + β5incsqi +β6agesqi + β7marri + β8fsizei + µi , i = 1, ..., n. Obtain the OLS estimators of the regression coefficients and their standard errors. Compare the new estimators with those obtained in (b). How have they changed? Compare the R2 in this model and in the previous model. How informative is this comparison regarding the added value of the four new regressors?
Obtain the F−statistic to test for the null hypothesis that β5, β6, β7 and β8 are jointly equal to 0, and for the null that β5 and β6 are jointly equal to zero. Would you reject these null hypotheses? Based on these F−tests, would you prefer the first or the second model? d. Consider your answer to part (a). How can you modify the estimation to address any heteroskedasticity present in this model? Modify the model you estimate and report the new (heteroskedasticity-robust) standard errors. e. Derive the marginal effect of age on netff a, and explain its meaning. What is the effect of increasing age by one year on net financial assets for a person that is 30 year old? How about the same effect for a person that is 65 year old? Comment on the difference.
Question 1 a)
Scatter plot
The scatter plot above show a positive correlation of the variables net financial wealth and the family income. The variables show a heteroskedasticity relationship in that the variables tend to show a lot of variations in the data points.
b)
Model |
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
||
B |
Std. Error |
Beta |
||||
1 |
(Constant) |
-63.616 |
2.681 |
-23.728 |
.000 |
|
inc |
.925 |
.026 |
.348 |
35.271 |
.000 |
|
e401k |
6.023 |
1.285 |
.046 |
4.685 |
.000 |
|
male |
4.411 |
1.513 |
.028 |
2.916 |
.004 |
|
age |
1.050 |
.059 |
.169 |
17.664 |
.000 |
|
The model is as shown below
nettfai = -63.616+ 4.441male + 6.023e401k + 0.925inci + 1.050agei + ui
Summary of the model. |
||||
Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
1 |
.414a |
.172 |
.171 |
58.2246839 |
The f statistics is 480.591 as shown by the anova table below;
ANOVAa |
||||||
Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
|
1 |
Regression |
6517034.466 |
4 |
1629258.616 |
480.591 |
.000b |
Residual |
31426355.028 |
9270 |
3390.114 |
|||
Total |
37943389.494 |
9274 |
||||
We do not reject the null hypothesis since the significance level is less than 0.05 that is 0.0000
c)
Coefficientsa |
||||||
Model |
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
||
B |
Std. Error |
Beta |
||||
1 |
(Constant) |
20.825 |
10.087 |
2.065 |
.039 |
|
inc |
-.203 |
.079 |
-.076 |
-2.578 |
.010 |
|
e401k |
9.409 |
1.278 |
.072 |
7.364 |
.000 |
|
male |
.639 |
1.612 |
.004 |
.396 |
.692 |
|
age |
-1.692 |
.499 |
-.272 |
-3.391 |
.001 |
|
incsq |
.010 |
.001 |
.463 |
16.480 |
.000 |
|
agesq |
.031 |
.006 |
.441 |
5.485 |
.000 |
|
marr |
-2.909 |
1.699 |
-.022 |
-1.713 |
.087 |
|
fsize |
-1.290 |
.498 |
-.031 |
-2.589 |
.010 |
|
nettfai =20.825+ 0.639male + 9.409e401k -0.203inci -1.692agei (3) +0.010incsqi + 0.031agesqi -2.909marri -1.290fsizei + ui |
The standard errors for coefficients of model 2 are much larger as compared to those for model 1 thus the heteroskedasticity for model 2 is much larger as compared to that of model 1
Comparing R squared we have;
Summary of the model |
||||
Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
1 |
.452a |
.204 |
.203 |
57.0941024 |
R squared for model 1 is smaller as compared to R squared of model 2 however they both represent a small fit of the model.
F statistic in model 2 is 296.752 as shown in the table
Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
|
1 |
Regression |
7738670.864 |
8 |
967333.858 |
296.752 |
.000b |
Residual |
30204718.630 |
9266 |
3259.737 |
|||
Total |
37943389.494 |
9274 |
||||
We do not reject the null hypothesis since the p-value is less than 0.05
I would consider model two since it incorporates all the variables.
D)
Coefficientsa |
||||||
Model |
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
||
B |
Std. Error |
Beta |
||||
1 |
(Constant) |
-32.950 |
2.674 |
-12.322 |
.000 |
|
age |
1.266 |
.063 |
.204 |
20.057 |
.000 |
|
Age has a marginal effect of 1.266 implies for every increase in 1 unit of net financial wealth there is an increase in 1.266 units of age. There will always be an increase in 1.266 units of age as much as there is an increase in 1 unit of net financial wealth at any given age.