Question

In: Statistics and Probability

Why is the use of OLS regression inappropriate when the dependent variable is dichotomous? explain one...

Why is the use of OLS regression inappropriate when the dependent variable is dichotomous? explain one technique you can use instead, clearly stating how to interpret the estimated regression coefficients.   

Solutions

Expert Solution

  • The linear regression model is based on an assumption that the outcome Y is continuous, with errors which are normally distributed. If the outcome variable is binary this assumption is clearly violated.
  • Here we can use Logistic Regression. A logistic regression predicts the probability that an observation falls into one of two categories of a dichotomous dependent variable based on one or more independent variables that can be either continuous or categorical.
    The "logit" model is:
    where:
    ln is the natural logarithm, logexp, where exp=2.71828…
  • p is the probability that the event Y occurs, p(Y=1)
  • p/(1-p) is the "odds ratio"
  • ln[p/(1-p)] is the log odds ratio, or "logit"
  • all other components of the model are the same.
    The logistic regression model is simply a non-linear transformation of the linear regression. The "logistic" distribution is an S-shaped distribution function which is similar to the standard-normal distribution (which results in a probit regression model) but easier to work with in most applications (the probabilities are easier to calculate). The logit distribution constrains the estimated probabilities to lie between 0 and 1.

The estimated probability is:

p = 1/[1 + exp(-a - BX)]

  • Interpreting logit coefficients:
    Instead of the slope coefficients (B) being the rate of change in Y (the dependent variables) as X changes (as in the LP model or OLS regression), now the slope coefficient is interpreted as the rate of change in the "log odds" as X changes.
    An interpretation of the logit coefficient which is usually more intuitive (especially for dummy independent variables) is the "odds ratio"-- expB is the effect of the independent variable on the "odds ratio" [the odds ratio is the probability of the event divided by the probability of the nonevent]. For example, if expB =5, then a one unit change in X would make the event 5 times as likely to occur. Odds ratios equal to 1 mean that there is a 50/50 chance that the event will occur with a small change in the independent variable. Negative coefficients lead to odds ratios less than one: if expB2 =.67, then a one unit change in X2 leads to the event being less likely to occur.

Related Solutions

a) What is the primary difference between the dependent variable of an OLS regression model and...
a) What is the primary difference between the dependent variable of an OLS regression model and a logit model? b) What is the primary difference between the model results of an OLS regression model and a logit model?
Regression and Correlation Analysis Use the dependent variable (labeled Y) and one of the independent variables...
Regression and Correlation Analysis Use the dependent variable (labeled Y) and one of the independent variables (labeled X1, X2, and X3) in the data file. Select and use one independent variable throughout this analysis. Use Excel to perform the regression and correlation analysis to answer the following. Generate a scatterplot for the specified dependent variable (Y) and the selected independent variable (X), including the graph of the "best fit" line. Interpret. Determine the equation of the "best fit" line, which...
When the response variable of interest is dichotomous rather than continuous, why is it not advisable...
When the response variable of interest is dichotomous rather than continuous, why is it not advisable to fit a standard linear regression model using the probability of success as the outcome?
When is it inappropriate to use linear regression for measuring the association between two variables?
When is it inappropriate to use linear regression for measuring the association between two variables?
Is CART the right technique to use when the dependent variable is skewed towards one of...
Is CART the right technique to use when the dependent variable is skewed towards one of the class? If yes, can we use classification accuracy as the right performance measure? If not why? In that case, which model performance measures should be used?
Why is it considered risky and unreliable to use the regression formula to predict the dependent...
Why is it considered risky and unreliable to use the regression formula to predict the dependent variable outside the range of the actual values of the independent variable is...
Why and where do we use dependent and independent variable?
Why and where do we use dependent and independent variable?
For the following, identify the independent variable and the dependent variable and explain why we should...
For the following, identify the independent variable and the dependent variable and explain why we should use a parametric or nonparametric procedure. (a) When ranking the intelligence of a group of people given a smart pill. (b) When compar- ing the median income for a group of college professors to that of the national population of all incomes. (c) When comparing the mean reading speed for a sam- ple of hearing-impaired children to the average reading speed in the population...
1.) Use Excel to plot the dependent vs the independent variable. Show the regression equation from...
1.) Use Excel to plot the dependent vs the independent variable. Show the regression equation from the computer output. Lannie Karner- GPA 3.6 Income 75k Courtney Sheperd Gpa 3.3 Income 74K Zenobia Roussel- GPA 2.9 Income 66K Elaine Doody- GPA 3.8 Income 80k Maudie Hocker-GPA 3.1 Income 65k Rick Hoover-GPA 3.2 Income 53k Franinca Ortez-GPA 2.7 Income 65k Li Kinder-GPA 3.3 Income 71k Brad Clem-GPA 3.8 Income 80k Soon Nettleton-GPA 4.0 Income 95k Vertie Yousesef-GPA 3.9 Income 110k Love Au-GPA...
a. If they are going to run a linear regression, identify which variable should be the independent variable and which should be the dependent variable in a regression equation.
In seeking to determine how influential advertising is, the management of a recently established retail chain collected data on sales revenue and advertising expenditure from its' stores over the last ten (10) weeks. The table below shows the data collected: Advertising Expenditure ($ 000) Sales ($ 000) 3 5 76 50 250 700 450 3.5 75 4 150 4.5 7 200 750 7.5 800 8.5 1,100 a. If they are going to run a linear regression, identify which variable should...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT