In: Statistics and Probability
This question requires you to interpret and communicate the findings of two linear regression models. The data is from an article that studies the relationship between salaries of legislators and representation of the working-classes in state legislatures in the US.
Background
If politicians in the United States were paid better, would more working-class people become politicians? It is often argued that if politicians are paid too little, then it is economically too difficult for lower-income citizens to hold positions of office. This could mean that low-paying political jobs lead to the under-representation of working-class people in politics. On the other hand, if politicians are paid more, then holding political office might become more attractive to wealthy people, and this might also lead to the under-representation of working-class people. To investigate these two contrasting hypotheses, we will examine data on the salaries paid in different state legislatures in the US and the percentage of legislators who come from working-class backgrounds.
Dataset The dataset includes salaries of state legislators from all 50 states in the US. It also includes variables measuring information unique to each state such as the length of the legislative session and the number of staffers in each legislature. The occupational backgrounds of legislators are also included, as well demographic data on the makeup of the population in each state. A detailed description of the dataset is provided in the table below.
Variable Description
pct_worker Percentage of legislators from working-class backgrounds
salary. Average salary of legislators in $100,000s
session_len Length of legislative session (in days)
staff_size Average number of full-time permanent staffers in the legislature.
term_limits. Binary indicator (0 or 1) of term limits for state legislators
income. Average per-capita income (in $1000s)
income_inequality Percentage of income to top 1% of earners
pct_union. Percentage of workers belonging to a labour union
pct_black Percentage of state residents who are Black
pct_urban Percentage of state residents living in urban areas
poverty_rate. Percent of state residents living below the poverty line
3a. Multiple Linear Regression
This question requires you to interpret and communicate the findings of two linear regression models from Table 1.
Model 1 presents results from a simple linear regression, where the independent variable is salary. Model 2 presents results from a multiple linear regression which includes a number of explanatory variables. The dependent variable for both models is the percentage of legislators from working-class backgrounds.
Your task is to interpret the models and write up the results as if you were writing the discussion for publication in a major journal/book. Interpret the two models statistically and substantively, and in comparison to one another. You should focus on determining which variables have coefficients that are significantly different from zero, and what the effect sizes mean in substantive terms. Simply listing the significant effects will be insufficient to receive full marks. You should also comment on how the estimates differ between the two models, and on the fit statistics of the two models.
Table 1: Legislative salaries and working-class representation
Model1 Model2
(Intercept) 155.49 −199.48
(41.00) (103.85)
Salary −0.56 −0.61
(0.10) (0.13)
term_limits 0.26 (0.84)
income −0.03 (0.05)
income_inequality −0.26 (0.11)
poverty_rate −0.05 (0.07)
pct_union 0.12 (0.04)
pct_black −0.06 (0.02)
pct_urban −0.03 (0.02)
R2(Rsquared) 0.19 0.35
Adj.(Rsquared) 0.18 0.31
Num. obs. 200 200
Note: Figures in parentheses are the standard errors of the regression coefficients.
Model 1
y = 155.49 -0.56 Salary
Model 2
y = -199.48 - 0.61 Salary +(0.26 + - 0.86) * term +(-.03 +-0.05) * income+(-0.26 +-0.11)*inequality ...
We see that both the model take 200 obseevatons
Now we focus on th Adjusted R squared value
The adjusted Rsquared value for model 1 is 0.18
The adjusted Rsquared value for model 2 is nearly double than model 1 at 0.31
This means that to solve thw problem we need to consider model 2 for better results
The only difference is that the standard error of model 1 is 0.1 and model 2 is slightly higher at 0.13
but considering the fact that model 2 has considerebly more number of variables than model 1 this standard error is acceptable for th calculations
Now focusssing on the variables that are significant
We can find which variables are significant by looking at the confidence interval for each variable
If the confidence interval containd zero we do not consider it significant
For example poverty_rate has the confidence interval -0.05 + - 0.07 and this contains the number zero hence poverty_rate is not significant
Hence we can say income_inequality, pct_union, pct-black,pct_urban are significant variables