In: Statistics and Probability
I have a question which involves the use of stata
what regression would I have to run to answer this question?
I have
What is the average size of the gender pay gap after the implementation (2018) of the regulation? Run a regression to estimate this. Think carefully about how the variables in your dataset might need to be transformed in order to interpret the estimated coefficients in a reasonable way. Justify your choice of model.
GRSSWK | Gender | Mon | Year |
430 | male | January | 2018 |
420 | male | January | 2017 |
390 | female | April | 2017 |
390 | male | June | 2018 |
450 | female | August | 2017 |
400 | female | December | 2017 |
550 | male | March | 2017 |
500 | male | March | 2018 |
420 | male | June | 2018 |
Here we have to fit a proper regression model in the given data to answer the average size of the gender pay gap after the implimentation(2018) of regulation. Note that, here the first two columns and the last column are the variables. First column that is, column represents "gross weekly pay in the respondent's main job" , secoend column that is "gender of respondent" and the last column that is column represents "the year the respondent started his/her current job". Here 'year' and 'gender' are the covariates and 'GRSSWK' is the response variable. Note that, year is either 2018 or 2017 and gender is also of two category male or female. So this is a problem of ANOVA (analysis of variance) with covariates. So we fit simple linear regression model with two covariates x1 and x2 which are nothing but the indicator variables representing 'year' and 'gender' respectively.
For year 2018 x1 is '1' and '0' for 2017.
For male x2 is '1' and '0' for female.
Our model is
y=a + bx1 + cx2 + e
where y represents "GRSSWK", a,b,c are constants and 'a' is the intercept term and b & c are the regression coefficients. 'e' is the error term with mean 0 and constant variance.
The estimate of 'a' will give the average size of the gender pay gap after the implimentation(2018) of reulation.
We can use R to run the regression algorithm. R code is given below :
weekpay=c(430,420,390,390,450,440,550,500,420) ##...response
y
year=c(1,0,0,1,0,0,0,1,1) ##....indicator vector of year (x1)
gender=c(1,1,0,1,0,0,1,1,1) ##....indicator vector of gender
(x2)
year=as.factor(year) ##...factorising year
gender=as.factor(gender) ##...factorising gender
lm(weekpay~year+gender) ##... fitting regression model
Output of R :
Call:
lm(formula = weekpay ~ year + gender)
Coefficients:
(Intercept) year1 gender1
426.67 -50.00 58.33
Hence the average gener pay gap after the implimentation (2018) of regulation is 426.67 that is approximately 427.