Question

In: Statistics and Probability

This question requires you to interpret and communicate the findings of two linear regression models. The...

This question requires you to interpret and communicate the findings of two linear regression models. The data is from an article that studies the relationship between salaries of legislators and representation of the working-classes in state legislatures in the US.

Background

If politicians in the United States were paid better, would more working-class people become politicians? It is often argued that if politicians are paid too little, then it is economically too difficult for lower-income citizens to hold positions of office. This could mean that low-paying political jobs lead to the under-representation of working-class people in politics. On the other hand, if politicians are paid more, then holding political office might become more attractive to wealthy people, and this might also lead to the under-representation of working-class people. To investigate these two contrasting hypotheses, we will examine data on the salaries paid in different state legislatures in the US and the percentage of legislators who come from working-class backgrounds.

Dataset The dataset includes salaries of state legislators from all 50 states in the US. It also includes variables measuring information unique to each state such as the length of the legislative session and the number of staffers in each legislature. The occupational backgrounds of legislators are also included, as well demographic data on the makeup of the population in each state. A detailed description of the dataset is provided in the table below.

Variable                           Description

pct_worker                    Percentage of legislators from working-class backgrounds

salary.                            Average salary of legislators in $100,000s

session_len                     Length of legislative session (in days)

staff_size                       Average number of full-time permanent staffers in the legislature.

term_limits.                  Binary indicator (0 or 1) of term limits for state legislators

income.                          Average per-capita income (in $1000s)

income_inequality        Percentage of income to top 1% of earners

pct_union.                    Percentage of workers belonging to a labour union

pct_black                       Percentage of state residents who are Black

pct_urban                       Percentage of state residents living in urban areas

poverty_rate.               Percent of state residents living below the poverty line

3a. Multiple Linear Regression

This question requires you to interpret and communicate the findings of two linear regression models from Table 1.

Model 1 presents results from a simple linear regression, where the independent variable is salary. Model 2 presents results from a multiple linear regression which includes a number of explanatory variables. The dependent variable for both models is the percentage of legislators from working-class backgrounds.

Your task is to interpret the models and write up the results as if you were writing the discussion for publication in a major journal/book. Interpret the two models statistically and substantively, and in comparison to one another. You should focus on determining which variables have coefficients that are significantly different from zero, and what the effect sizes mean in substantive terms. Simply listing the significant effects will be insufficient to receive full marks. You should also comment on how the estimates differ between the two models, and on the fit statistics of the two models.

              Table 1: Legislative salaries and working-class representation

                                          

Model1                              Model2

(Intercept)                                      155.49                               −199.48

(41.00)                              (103.85)

Salary                                              −0.56                                −0.61

(0.10)                                 (0.13)

term_limits                                     0.26                                   (0.84)

income                                           −0.03                                 (0.05)

income_inequality                        −0.26                                 (0.11)

poverty_rate                                  −0.05                                 (0.07)

pct_union                                       0.12                                   (0.04)

pct_black                                        −0.06                                (0.02)

pct_urban                                       −0.03                                 (0.02)

R2(Rsquared)                                 0.19                                   0.35

Adj.(Rsquared)                              0.18                                   0.31

Num. obs.                                       200                                     200

Note: Figures in parentheses are the standard errors of the regression coefficients.

Solutions

Expert Solution

Model 1

y = 155.49 -0.56 Salary

Model 2

y = -199.48 - 0.61 Salary +(0.26 + - 0.86) * term +(-.03 +-0.05) * income+(-0.26 +-0.11)*inequality ...

We see that both the model take 200 obseevatons

Now we focus on th Adjusted R squared value

The adjusted Rsquared value for model 1 is 0.18

The adjusted Rsquared value for model 2 is nearly double than model 1 at 0.31

This means that to solve thw problem we need to consider model 2 for better results

The only difference is that the standard error of model 1 is 0.1 and model 2 is slightly higher at 0.13

but considering the fact that model 2 has considerebly more number of variables than model 1 this standard error is acceptable for th calculations

Now focusssing on the variables that are significant

We can find which variables are significant by looking at the confidence interval for each variable

If the confidence interval containd zero we do not consider it significant

For example poverty_rate has the confidence interval -0.05 + - 0.07 and this contains the number zero hence poverty_rate is not significant

Hence we can say income_inequality, pct_union, pct-black,pct_urban are significant variables


Related Solutions

Determine and interpret the linear correlation coefficient, and use linear regression to find a best fit...
Determine and interpret the linear correlation coefficient, and use linear regression to find a best fit line for a scatter plot of the data and make predictions. Scenario According to the U.S. Geological Survey (USGS), the probability of a magnitude 6.7 or greater earthquake in the Greater Bay Area is 63%, about 2 out of 3, in the next 30 years. In April 2008, scientists and engineers released a new earthquake forecast for the State of California called the Uniform...
6. The error term in linear regression models is assumed: (A) having the mean of zero...
6. The error term in linear regression models is assumed: (A) having the mean of zero (B) having the variance of zero (C) being normally distributed with a positive mean (D) being normally distributed with a negative mean 7.How should β k in the general multiple regression model be interpreted? (A) The number of units of change in the expected value of Y for a 1 unit increase in X k when all remaining variables are unchanged. (B) The magnitude...
develop simple linear regression models for predicting sales as a function of the number of each...
develop simple linear regression models for predicting sales as a function of the number of each type of ad. Compare these results to a multiple linear regression model using both independent variables. State each model and explain R- square, significance F and P-values. Concert Sales Thousands of Thousands of Sales ($1000) Radio&TV ads Newspaper ads $1,119.00 0 40 $973.00 0 40 $875.00 25 25 $625.00 25 25 $910.00 30 30 $971.00 30 30 $931.00 35 35 $1,177.00 35 35 $882.00...
With regard to regression models, which of the following statements is correct? i) Linear restrictions on...
With regard to regression models, which of the following statements is correct? i) Linear restrictions on regression parameters cannot be tested using an F-test. ii) The general-to-specific approach (also called “top-down”) starts with a model containing all explanatory variables. Subsequently, the least significant variables are dropped one by one until all of the variables remaining in the model are statistically significant. iii) Multicollinearity in a regression results in high t-statistics for individualexplanatory variables and a failure of the F-test to...
9) Use the following data to estimate a linear regression equation between y and x. Interpret...
9) Use the following data to estimate a linear regression equation between y and x. Interpret the estimated slope coefficient. Predict y for an x value of 9. Calculate and interpret the model’s R-squared. x y 21 12 17 10 11 8 3 5 13 15
You will complete a question about Correlation Examples and complete a Simple Linear Regression. For the...
You will complete a question about Correlation Examples and complete a Simple Linear Regression. For the Simple Linear Regression, make sure to complete the following steps: Construct a scatter plot. Find the equation of the regression line. Predict the value of y for each of the x-values. Use this resource: Regression Give an example of two variables that have a positive linear correlation. Give an example of two variables that have a negative linear correlation. Give an example of two...
The data presented in Problem 7 are analyzed using multiple linear regression analysis and the models...
The data presented in Problem 7 are analyzed using multiple linear regression analysis and the models are shown here. In the models, the data are coded as 1 = new medication and 0 = standard medication, and age 65 and older is coded as 1 = yes and 0 = no. ŷ = 53.85 − 23.54 (Medication) ŷ = 45.31 − 19.88 (Medication) + 14.64 (Age 65 +) ŷ = 45.51 − 20.21 ( Medication ) + 14.29 ( Age...
4. Linear regression question: An experiment carried out to investigate the design of two variables x...
4. Linear regression question: An experiment carried out to investigate the design of two variables x and y as follows: X: 303 313 323 323 333 340 343 353 353 364 Y: .96 .83 .72 .73 .65 .60 .58 .52 .53 .46 a)Draw a scatter plot and list your observations. b) Plot ln(y) against 1/x. What does this plot suggest? c) Fit a straight line to the transformed data of part b and use that line to predict the value...
We've now had an introduction to several different models: Linear regression, logistic regression, k-means, hierarchical clustering,...
We've now had an introduction to several different models: Linear regression, logistic regression, k-means, hierarchical clustering, GMM, Naive Bayes, and decision trees. For this assignment, I would like you to choose three models from the above list and describe two problems that each of the models could potentially be used to solve. You can do one big post with all three models and six solvable problems or do three separate posts if you prefer.   Short Explanation of Decision Trees Decision...
Problem You will complete a question about Correlation Examples and complete a Simple Linear Regression. For...
Problem You will complete a question about Correlation Examples and complete a Simple Linear Regression. For the Simple Linear Regression, make sure to complete the following steps: Construct a scatter plot. Find the equation of the regression line. Predict the value of y for each of the x-values. Use this resource: Regression Give an example of two variables that have a positive linear correlation. Give an example of two variables that have a negative linear correlation. Give an example of...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT