Question

In: Statistics and Probability

The data set data_ksubs.csv contains information on net financial wealth (nettf a), age of the survey...

The data set data_ksubs.csv contains information on net financial wealth (nettf a), age of the survey respondent (age), annual family income (inc), family size (fsize), and participation in certain pension plans for people in the United States. The wealth and income variables are both recorded in thousands of dollars. In particular, the variable e401k is equal to 1 is the person is eligible for 401k, a retirement savings plan sponsored by the employer, and 0 otherwise. a. Create a scatter plot of nettf a against inc. Can you observe any visible correlation between nettf a and inc? Do you think that a regression of nettf a on inc may feature heteroskedasticity? Explain. b. Suppose that the Least-Squares assumptions are satisfied and estimate the following regression model: nettf ai = β0 + β1malei + β2e401ki + β3inci + β4agei + µi , i = 1, ..., n. Report the estimated values of the regression coefficients and discuss their signs (if each is or is not as you expected), as well as their standard errors and significance levels. Also, report the R2 and the value of the F−statistic for the null hypothesis that all the slope coefficients are equal to 0. Do you reject the F-test’s null hypothesis? c. We now introduce some additional variables and some nonlinearities in the model. We add the square of age (agesq), the square of income (incsq), a dummy for the individual being married (marr), and the household size (fsize). We thus estimate the following model. nettf ai = β0 + β1malei + β2e401ki + β3inci + β4agei + β5incsqi +β6agesqi + β7marri + β8fsizei + µi , i = 1, ..., n. Obtain the OLS estimators of the regression coefficients and their standard errors. Compare the new estimators with those obtained in (b). How have they changed? Compare the R2 in this model and in the previous model. How informative is this comparison regarding the added value of the four new regressors?

Obtain the F−statistic to test for the null hypothesis that β5, β6, β7 and β8 are jointly equal to 0, and for the null that β5 and β6 are jointly equal to zero. Would you reject these null hypotheses? Based on these F−tests, would you prefer the first or the second model? d. Consider your answer to part (a). How can you modify the estimation to address any heteroskedasticity present in this model? Modify the model you estimate and report the new (heteroskedasticity-robust) standard errors. e. Derive the marginal effect of age on netff a, and explain its meaning. What is the effect of increasing age by one year on net financial assets for a person that is 30 year old? How about the same effect for a person that is 65 year old? Comment on the difference.

Solutions

Expert Solution

Question 1 a)

Scatter plot

The scatter plot above show a positive correlation of the variables net financial wealth and the family income. The variables show a heteroskedasticity relationship in that the variables tend to show a lot of variations in the data points.

b)

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

-63.616

2.681

-23.728

.000

inc

.925

.026

.348

35.271

.000

e401k

6.023

1.285

.046

4.685

.000

male

4.411

1.513

.028

2.916

.004

age

1.050

.059

.169

17.664

.000

The model is as shown below

nettfai = -63.616+ 4.441male + 6.023e401k + 0.925inci + 1.050agei + ui

Summary of the model.

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.414a

.172

.171

58.2246839

The f statistics is 480.591 as shown by the anova table below;

ANOVAa

Model

Sum of Squares

df

Mean Square

F

Sig.

1

Regression

6517034.466

4

1629258.616

480.591

.000b

Residual

31426355.028

9270

3390.114

Total

37943389.494

9274

We do not reject the null hypothesis since the significance level is less than 0.05 that is 0.0000

c)

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

20.825

10.087

2.065

.039

inc

-.203

.079

-.076

-2.578

.010

e401k

9.409

1.278

.072

7.364

.000

male

.639

1.612

.004

.396

.692

age

-1.692

.499

-.272

-3.391

.001

incsq

.010

.001

.463

16.480

.000

agesq

.031

.006

.441

5.485

.000

marr

-2.909

1.699

-.022

-1.713

.087

fsize

-1.290

.498

-.031

-2.589

.010

nettfai =20.825+ 0.639male + 9.409e401k -0.203inci -1.692agei (3)

+0.010incsqi + 0.031agesqi -2.909marri -1.290fsizei + ui

The standard errors for coefficients of model 2 are much larger as compared to those for model 1 thus the heteroskedasticity for model 2 is much larger as compared to that of model 1

Comparing R squared we have;

Summary of the model

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.452a

.204

.203

57.0941024

R squared for model 1 is smaller as compared to R squared of model 2 however they both represent a small fit of the model.

F statistic in model 2 is 296.752 as shown in the table

Model

Sum of Squares

df

Mean Square

F

Sig.

1

Regression

7738670.864

8

967333.858

296.752

.000b

Residual

30204718.630

9266

3259.737

Total

37943389.494

9274

We do not reject the null hypothesis since the p-value is less than 0.05

I would consider model two since it incorporates all the variables.

D)

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

-32.950

2.674

-12.322

.000

age

1.266

.063

.204

20.057

.000

Age has a marginal effect of 1.266 implies for every increase in 1 unit of net financial wealth there is an increase in 1.266 units of age. There will always be an increase in 1.266 units of age as much as there is an increase in 1 unit of net financial wealth at any given age.


Related Solutions

The data set data_ksubs.csv contains information on net financial wealth (nettf a), age of the survey...
The data set data_ksubs.csv contains information on net financial wealth (nettf a), age of the survey respondent (age), annual family income (inc), family size (fsize), and participation in certain pension plans for people in the United States. The wealth and income variables are both recorded in thousands of dollars. In particular, the variable e401k is equal to 1 is the person is eligible for 401k, a retirement savings plan sponsored by the employer, and 0 otherwise. a. Create a scatter...
The table below contains information from a survey among 499 participants classified according to their age...
The table below contains information from a survey among 499 participants classified according to their age groups. The second column shows the percentage of obese people per age class among the study participants. The last column comes from a different study at the national level that shows the corresponding percentages of obese people in the same age classes in the USA. Perform a hypothesis test at the 5% significance level to determine whether the survey participants are a representative sample...
The crab data set contains information on the number of "satellites" per female crab. Use a...
The crab data set contains information on the number of "satellites" per female crab. Use a Bayesian model to infer the Poisson parameter. a.  Write the likelihood function. b. Derive the posterior distribution using a Gamma prior w/ rate=20 & shape=3 c. Provide the posterior mean, posterior SD and 95 and 99% posterior credibility region (hint: you can use qgamma). d. Plot the prior and posterior distribution of lambda, in the same plot. mean = 2.919075 variance = 9.912018 n =...
1. The following data set contains information on years of formal education and incomes in 2015....
1. The following data set contains information on years of formal education and incomes in 2015. Row    Education    Income in         in Years      2015 Dollars 1          7         22587 2         10         28305 3         12         40196 4         13         49483 5         14         54483 6         16         78073 7         18         99540 8         19        155646 9         21        125310 a. Estimate the regression equation Income = a + b(Education). b. What is the predicted increase in Income for a one-year increase in Education? c. What do you...
The data set in CEOSAL2 contains information on chief executive officers for U.S. corporations. The variable...
The data set in CEOSAL2 contains information on chief executive officers for U.S. corporations. The variable salary is annual compensation, in thousands of dollars, and ceoten is prior number of years as company CEO. Write the steps using R Studio! (a) Find the average salary and the average tenure in the sample. (b) How many CEOs are in their first year as CEO (that is, ceoten = 0)? What is the longest tenure as a CEO? (c) Plot scatter plot...
Load “Lock5Data” into your R console. Load “OlympicMarathon” data set in “Lock5Data”. This data set contains...
Load “Lock5Data” into your R console. Load “OlympicMarathon” data set in “Lock5Data”. This data set contains population of all times to finish the 2008 Olympic Men’s Marathon. a) What is the population size? b) Now using “Minutes” column generate a random sample of size 5. c) Calculate the sample mean and record it (create a excel sheet or write a direct R program to record this) d) Continue steps (b) and (c) 10,000 time (that mean you have recorded 10,000...
The StatCrunch data set for this question contains the data measurements described in Question 11. (H0...
The StatCrunch data set for this question contains the data measurements described in Question 11. (H0 : µ1 - µ2 ≤ 0 HA : µ1 - µ2 > 0) Assume that the two samples are dawn from independent, normally distributed populations that have different standard deviations. Use this data set and the results from Question 11 to calculate the p-value for the hypothesis test. Round your answer to three decimal places; add trailing zeros as needed. The p-value = [S90PValue]....
The American Community Survey is a survey that uses U.S. census data to compile information on...
The American Community Survey is a survey that uses U.S. census data to compile information on various characteristics of the U.S. population. Here are statistics that I would like you to analyze from a sample of states. The independent variable (x-variable) is the percent of the state population living below the poverty level. The dependent variable (y-variable) is the state infant mortality rate in deaths per 1000 births. The data is displayed in the following table: State: Percent of Population...
The American Community Survey is a survey that uses U.S. census data to compile information on...
The American Community Survey is a survey that uses U.S. census data to compile information on various characteristics of the U.S. population. Here are statistics that I would like you to analyze from a sample of states. The independent variable (x-variable) is the percent of the state population living below the poverty level. The dependent variable (y-variable) is the state infant mortality rate in deaths per 1000 births. The data is displayed in the following table: State: Percent of Population...
A data set contains the yearly tuitions in for undergraduate programs in arts and humanities at...
A data set contains the yearly tuitions in for undergraduate programs in arts and humanities at 66 universities and colleges. Tuition fees are different for domestic and international students. Suppose the mean tuition charged to domestic students was ​$5146, with a standard deviation of ​$944. For international​ students, suppose the mean was $14,504​, with a standard deviation of ​$3175. Which would be more​ unusual: a university or college with a domestic student tuition fee of ​$3000 or one with an...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT