Question

In: Math

suppose that you have data on many (say 1,000) randomly selected employed country's  residents. FURTHER DETAILS GIVEN...

suppose that you have data on many (say 1,000) randomly selected employed country's  residents. FURTHER DETAILS GIVEN IN THE END OF THE QUESTIONS

a) Explain how you would test whether, holding everything else constant, females earn less than males.

b) Explain how you would measure the payoff to someone becoming bilingual if her mother tongue is i) French, ii) English.

c) Does including both X3 and X4 in this regression model have the potential to show any "problems" when estimating your regression model? Explain. Would eliminating one of them potentially cause other problems? Explain

d) Can you use this model to test if the influence of on-the-job experience is greater for males than females? Why or why not? If not, how would you need to change the model to test whether the influence of on the job experience is greater for males than females?

FURTHER DETAILS:

Consider the following linear regression model "explaining" salaries in the Country:

Y = β0 + β1X1 + β2X2 + β3X3 + β4X4 + β5D1 + β6D2 + β7D3 + µ

where: Y = salary,

X1 = years of education,

X2 = innate ability (proxied by IQ test results)

X3 = years of on the job experience

X4 = age

D1 = a dummy variable for gender (= 1 for males, 0 for females)

D2 = 1 for uni-lingual French speakers

D3 = 1 for uni-lingual English speakers

Solutions

Expert Solution

a)

To test whether holding everything else constant that females earn less than males, we have to test the null hypothesis H0 : beta5 ( coefficient for dummy variable for gender) = 0 vs the alternative hypothesis H1: beta5 > 0

If alternative hypothesis is that males earn more than females, it would imply that salary is higher for males, that is coefficient for males is greater than 0.

Thus, if the null hypothesis is rejected then it would imply that based on the data, females would earn less than males.

b)

If someone’s mother tongue is French, then becoming bilingual would mean learning English which would mean a payoff of learning English that is, an increase in salary by beta7.

If someone’s mother tongue is English, then becoming bilingual would mean learning French which would mean a payoff of learning French that is, an increase in salary by beta6.

c)

Including age and years on the job experience together might lead to problem of multi collinearity.

As years on the job experience is related to the age, with an increase in the latter leading to an increase in the former without loss of generality. And multicollinearity would lead inconsistent model results.

If one of the variables is removed however, it might reduce the predictive power of the model, in addition to the fact that all the variables that came significant previously might not come significant in the new model.

d)

This model can’t be used to test the influence of on the job experience being greater for males or females.

That is because both the variables are present as explanatory variables and there is no interaction term in the model as well to test the interaction between the variables on the job experience and gender.

One way in to test the influence of experience being greater for males than females or not is to use an interaction term in the regression model, that would help in understanding in which direction the salary goes for males or females when taken together with the on the job experience.


Related Solutions

1) Suppose you have the following data, which represents the area codes of 50 randomly selected...
1) Suppose you have the following data, which represents the area codes of 50 randomly selected students from endicott. Area code: 617,508,857 other. # of students 25,10,10,5 a) is the area code variable quantitative or categorical? Explain your reasoning. b) Create a bar graph to represent the area code data, using percents as the y-axis. Draw a sketch below, or attach a labeled graph from excel to the end of this exam. Make sure to label your graphs' axes and...
Suppose you have collected the following sample data from twenty-six randomly selected Dallas area families regarding...
Suppose you have collected the following sample data from twenty-six randomly selected Dallas area families regarding their Weekly Expenditures on Video Rentals and Weekly Expenditures on Dining Out: Observation Number Weekly Expenditures Video Rentals Dining Out 1 $12 $28 2 $12 $27 3 $6 $32 4 $13 $32 5 $7 $25 6 $10 $24 7 $8 $27 8 $4 $20 9 $4 $32 10 $13 $35 11 $5 $30 12 $9 $22 13 $12 $31 14 $9 $25 15 $12...
Refer to the data set of 20 randomly selected presidents given below. Treat the data as...
Refer to the data set of 20 randomly selected presidents given below. Treat the data as a sample and find the proportion of presidents who were taller than their opponents. Use that result to construct a​ 95% confidence interval estimate of the population percentage. Based on the​ result, does it appear that greater height is an advantage for presidential​ candidates? Why or why​ not? Construct a​ 95% confidence interval estimate of the percentage of presidents who were taller than their...
Refer to the data set of 20 randomly selected presidents given below. Treat the data as...
Refer to the data set of 20 randomly selected presidents given below. Treat the data as a sample and find the proportion of presidents who were taller than their opponents. Use that result to construct a​ 95% confidence interval estimate of the population percentage. Based on the​ result, does it appear that greater height is an advantage for presidential​ candidates? Why or why​ not? Click the icon to view the table of heights. Construct a​ 95% confidence interval estimate of...
Refer to the data set of 2020 randomly selected presidents given below. Treat the data as...
Refer to the data set of 2020 randomly selected presidents given below. Treat the data as a sample and find the proportion of presidents who were taller than their opponents. Use that result to construct a​ 95% confidence interval estimate of the population percentage. Based on the​ result, does it appear that greater height is an advantage for presidential​ candidates? Why or why​ not? LOADING... Click the icon to view the table of heights. Construct a​ 95% confidence interval estimate...
Given that the data below was randomly selected from a lognormal distribution. What is the probability...
Given that the data below was randomly selected from a lognormal distribution. What is the probability of a randomly selected item having a value greater than 3.9? Give your answer to 3 decimal places. Data 1.72, 124.2, 1.04, 0.84, 12.82, 11.76, 51.2, 0.11
Given that the data below was randomly selected from a lognormal distribution. What is the probability...
Given that the data below was randomly selected from a lognormal distribution. What is the probability of a randomly selected item having a value greater than 3.9? Give your answer to 3 decimal places. Data 1.72, 124.2, 1.04, 0.84, 12.82, 11.76, 51.2, 0.11 answer - 0.846 was wrong  
The file P02_35.xlsx contains data from a survey of 500 randomly selected households. a. Suppose you...
The file P02_35.xlsx contains data from a survey of 500 randomly selected households. a. Suppose you decide to generate a systematic random sample of size 25 from this population of data. How many such samples are there? What is the mean of Debt for each of the first three such samples, using the data in the order given? b. If you wanted to estimate the (supposedly unknown) population mean of Debt from a systematic random sample as in part a,...
Suppose data collected by observers at randomly selected intersections across the country revealed that in a...
Suppose data collected by observers at randomly selected intersections across the country revealed that in a sample of 250 ​drivers, 160 were using their cell phone. a. Give a point estimate of​ p, the true driver cell phone use rate​ (that is, the true proportion of drivers who are using their cell phone while​ driving). b. Compute a 95​% confidence interval for p. c. Give a practical interpretation of the​ interval, part b. a. A point estimate for p is...
The given data represent the total compensation for 10 randomly selected CEOs and their​ company's stock...
The given data represent the total compensation for 10 randomly selected CEOs and their​ company's stock performance in 2009. Analysis of this data reveals a correlation coefficient of r=-0.2000. What would be the predicted stock return for a company whose CEO made​ $15 million? What would be the predicted stock return for a company whose CEO made​ $25 million? Compensation ($ millions)   Stock Return (%) 26.81 6.16 12.66 29.92 19.14 31.49 13.11 79.34 11.99 -8.35 11.41 2.22 26.23 4.08 14.61...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT