Question

In: Statistics and Probability

The data set data_ksubs.csv contains information on net financial wealth (nettf a), age of the survey...

The data set data_ksubs.csv contains information on net financial wealth (nettf a), age of the survey respondent (age), annual family income (inc), family size (fsize), and participation in certain pension plans for people in the United States. The wealth and income variables are both recorded in thousands of dollars. In particular, the variable e401k is equal to 1 is the person is eligible for 401k, a retirement savings plan sponsored by the employer, and 0 otherwise. a. Create a scatter plot of nettf a against inc. Can you observe any visible correlation between nettf a and inc? Do you think that a regression of nettf a on inc may feature heteroskedasticity? Explain. b. Suppose that the Least-Squares assumptions are satisfied and estimate the following regression model: nettf ai = β0 + β1malei + β2e401ki + β3inci + β4agei + µi , i = 1, ..., n. Report the estimated values of the regression coefficients and discuss their signs (if each is or is not as you expected), as well as their standard errors and significance levels. Also, report the R2 and the value of the F−statistic for the null hypothesis that all the slope coefficients are equal to 0. Do you reject the F-test’s null hypothesis? c. We now introduce some additional variables and some nonlinearities in the model. We add the square of age (agesq), the square of income (incsq), a dummy for the individual being married (marr), and the household size (fsize). We thus estimate the following model. nettf ai = β0 + β1malei + β2e401ki + β3inci + β4agei + β5incsqi +β6agesqi + β7marri + β8fsizei + µi , i = 1, ..., n. Obtain the OLS estimators of the regression coefficients and their standard errors. Compare the new estimators with those obtained in (b). How have they changed? Compare the R2 in this model and in the previous model. How informative is this comparison regarding the added value of the four new regressors?

Obtain the F−statistic to test for the null hypothesis that β5, β6, β7 and β8 are jointly equal to 0, and for the null that β5 and β6 are jointly equal to zero. Would you reject these null hypotheses? Based on these F−tests, would you prefer the first or the second model? d. Consider your answer to part (a). How can you modify the estimation to address any heteroskedasticity present in this model? Modify the model you estimate and report the new (heteroskedasticity-robust) standard errors. e. Derive the marginal effect of age on netff a, and explain its meaning. What is the effect of increasing age by one year on net financial assets for a person that is 30 year old? How about the same effect for a person that is 65 year old? Comment on the difference.

Expert Solution

Question 1 a)

Scatter plot

The scatter plot above show a positive correlation of the variables net financial wealth and the family income. The variables show a heteroskedasticity relationship in that the variables tend to show a lot of variations in the data points.

b)


Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	-63.616	2.681		-23.728	.000
	inc	.925	.026	.348	35.271	.000
	e401k	6.023	1.285	.046	4.685	.000
	male	4.411	1.513	.028	2.916	.004
	age	1.050	.059	.169	17.664	.000

The model is as shown below

nettfai = -63.616+ 4.441male + 6.023e401k + 0.925inci + 1.050agei + ui

Summary of the model.
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.414^a	.172	.171	58.2246839

The f statistics is 480.591 as shown by the anova table below;

ANOVA^a
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	6517034.466	4	1629258.616	480.591	.000^b
	Residual	31426355.028	9270	3390.114
	Total	37943389.494	9274

We do not reject the null hypothesis since the significance level is less than 0.05 that is 0.0000

c)

Coefficients^a
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	20.825	10.087		2.065	.039
	inc	-.203	.079	-.076	-2.578	.010
	e401k	9.409	1.278	.072	7.364	.000
	male	.639	1.612	.004	.396	.692
	age	-1.692	.499	-.272	-3.391	.001
	incsq	.010	.001	.463	16.480	.000
	agesq	.031	.006	.441	5.485	.000
	marr	-2.909	1.699	-.022	-1.713	.087
	fsize	-1.290	.498	-.031	-2.589	.010
nettfai =20.825+ 0.639male + 9.409e401k -0.203inci -1.692agei (3) +0.010incsqi + 0.031agesqi -2.909marri -1.290fsizei + ui

The standard errors for coefficients of model 2 are much larger as compared to those for model 1 thus the heteroskedasticity for model 2 is much larger as compared to that of model 1

Comparing R squared we have;

Summary of the model
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.452^a	.204	.203	57.0941024

R squared for model 1 is smaller as compared to R squared of model 2 however they both represent a small fit of the model.

F statistic in model 2 is 296.752 as shown in the table


Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	7738670.864	8	967333.858	296.752	.000^b
	Residual	30204718.630	9266	3259.737
	Total	37943389.494	9274

We do not reject the null hypothesis since the p-value is less than 0.05

I would consider model two since it incorporates all the variables.

D)

Coefficients^a
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
		B	Std. Error	Beta
1	(Constant)	-32.950	2.674		-12.322	.000
	age	1.266	.063	.204	20.057	.000

Age has a marginal effect of 1.266 implies for every increase in 1 unit of net financial wealth there is an increase in 1.266 units of age. There will always be an increase in 1.266 units of age as much as there is an increase in 1 unit of net financial wealth at any given age.

orchestra answered 2 years ago

The data set data_ksubs.csv contains information on net financial wealth (nettf a), age of the survey...

The data set data_ksubs.csv contains information on net financial wealth (nettf a), age of the survey respondent (age), annual family income (inc), family size (fsize), and participation in certain pension plans for people in the United States. The wealth and income variables are both recorded in thousands of dollars. In particular, the variable e401k is equal to 1 is the person is eligible for 401k, a retirement savings plan sponsored by the employer, and 0 otherwise. a. Create a scatter...

The table below contains information from a survey among 499 participants classified according to their age...

The table below contains information from a survey among 499 participants classified according to their age groups. The second column shows the percentage of obese people per age class among the study participants. The last column comes from a different study at the national level that shows the corresponding percentages of obese people in the same age classes in the USA. Perform a hypothesis test at the 5% significance level to determine whether the survey participants are a representative sample...

The crab data set contains information on the number of "satellites" per female crab. Use a...

The crab data set contains information on the number of "satellites" per female crab. Use a Bayesian model to infer the Poisson parameter. a. Write the likelihood function. b. Derive the posterior distribution using a Gamma prior w/ rate=20 & shape=3 c. Provide the posterior mean, posterior SD and 95 and 99% posterior credibility region (hint: you can use qgamma). d. Plot the prior and posterior distribution of lambda, in the same plot. mean = 2.919075 variance = 9.912018 n =...

1. The following data set contains information on years of formal education and incomes in 2015....

1. The following data set contains information on years of formal education and incomes in 2015. Row Education Income in in Years 2015 Dollars 1 7 22587 2 10 28305 3 12 40196 4 13 49483 5 14 54483 6 16 78073 7 18 99540 8 19 155646 9 21 125310 a. Estimate the regression equation Income = a + b(Education). b. What is the predicted increase in Income for a one-year increase in Education? c. What do you...

The data set in CEOSAL2 contains information on chief executive officers for U.S. corporations. The variable...

The data set in CEOSAL2 contains information on chief executive officers for U.S. corporations. The variable salary is annual compensation, in thousands of dollars, and ceoten is prior number of years as company CEO. Write the steps using R Studio! (a) Find the average salary and the average tenure in the sample. (b) How many CEOs are in their first year as CEO (that is, ceoten = 0)? What is the longest tenure as a CEO? (c) Plot scatter plot...

Load “Lock5Data” into your R console. Load “OlympicMarathon” data set in “Lock5Data”. This data set contains...

Load “Lock5Data” into your R console. Load “OlympicMarathon” data set in “Lock5Data”. This data set contains population of all times to finish the 2008 Olympic Men’s Marathon. a) What is the population size? b) Now using “Minutes” column generate a random sample of size 5. c) Calculate the sample mean and record it (create a excel sheet or write a direct R program to record this) d) Continue steps (b) and (c) 10,000 time (that mean you have recorded 10,000...

The American Community Survey is a survey that uses U.S. census data to compile information on...

The American Community Survey is a survey that uses U.S. census data to compile information on various characteristics of the U.S. population. Here are statistics that I would like you to analyze from a sample of states. The independent variable (x-variable) is the percent of the state population living below the poverty level. The dependent variable (y-variable) is the state infant mortality rate in deaths per 1000 births. The data is displayed in the following table: State: Percent of Population...

The American Community Survey is a survey that uses U.S. census data to compile information on...

The American Community Survey is a survey that uses U.S. census data to compile information on various characteristics of the U.S. population. Here are statistics that I would like you to analyze from a sample of states. The independent variable (x-variable) is the percent of the state population living below the poverty level. The dependent variable (y-variable) is the state infant mortality rate in deaths per 1000 births. The data is displayed in the following table: State: Percent of Population...

The StatCrunch data set for this question contains the data measurements described in Question 11. (H0...

The StatCrunch data set for this question contains the data measurements described in Question 11. (H0 : µ1 - µ2 ≤ 0 HA : µ1 - µ2 > 0) Assume that the two samples are dawn from independent, normally distributed populations that have different standard deviations. Use this data set and the results from Question 11 to calculate the p-value for the hypothesis test. Round your answer to three decimal places; add trailing zeros as needed. The p-value = [S90PValue]....

A data set contains the yearly tuitions in for undergraduate programs in arts and humanities at...

A data set contains the yearly tuitions in for undergraduate programs in arts and humanities at 66 universities and colleges. Tuition fees are different for domestic and international students. Suppose the mean tuition charged to domestic students was $5146, with a standard deviation of $944. For international students, suppose the mean was $14,504, with a standard deviation of $3175. Which would be more unusual: a university or college with a domestic student tuition fee of $3000 or one with an...

Question

The data set data_ksubs.csv contains information on net financial wealth (nettf a), age of the survey...

Solutions

Expert Solution

Related Solutions

The data set data_ksubs.csv contains information on net financial wealth (nettf a), age of the survey...

The table below contains information from a survey among 499 participants classified according to their age...

The crab data set contains information on the number of "satellites" per female crab. Use a...

1. The following data set contains information on years of formal education and incomes in 2015....

The data set in CEOSAL2 contains information on chief executive officers for U.S. corporations. The variable...

Load “Lock5Data” into your R console. Load “OlympicMarathon” data set in “Lock5Data”. This data set contains...

The American Community Survey is a survey that uses U.S. census data to compile information on...

The American Community Survey is a survey that uses U.S. census data to compile information on...

The StatCrunch data set for this question contains the data measurements described in Question 11. (H0...

A data set contains the yearly tuitions in for undergraduate programs in arts and humanities at...