In: Statistics and Probability
When the response variable of interest is dichotomous rather than continuous, why is it not advisable to fit a standard linear regression model using the probability of success as the outcome?
Answer:
Command:
Statistics
Regression
Logistic regression
Description
Logistic regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome. The outcome is measured with a dichotomous variable (in which there are only two possible outcomes).
In logistic regression, the dependent variable is binary or dichotomous, i.e. it only contains data coded as 1 (TRUE, success, pregnant, etc. ) or 0 (FALSE, failure, non-pregnant, etc.).
The goal of logistic regression is to find the best fitting (yet biologically reasonable) model to describe the relationship between the dichotomous characteristic of interest (dependent variable = response or outcome variable) and a set of independent (predictor or explanatory) variables. Logistic regression generates the coefficients (and its standard errors and significance levels) of a formula to predict a logit transformation of the probability of presence of the characteristic of interest:
where p is the probability of presence of the characteristic of interest. The logit transformation is defined as the logged odds:
and
Rather than choosing parameters that minimize the sum of squared errors (like in ordinary regression), estimation in logistic regression chooses parameters that maximize the likelihood of observing the sample values.
Required input
The MedCalc dialog box for logistic regression is similar to the one for multiple regression. In the dialog box you first identify the dependent variable. Remember that the dependent variable must be binary or dichotomous, and it should only contains data coded as 0 or 1. Cases with values other than 0 or 1 for the dependent variable will be excluded from the analysis!
For the independent variables you enter the names of variables that you expect to influence the dependent variable.
You can click the button to obtain a list of variables. In this list you can select a variable by clicking the variable's name.
Options
Method: select the way independent variables are entered into
the model.
Enter: enter all variables in the model in one single step, without
checking
Forward: enter significant variables sequentially
Backward: first enter all variables into the model and next remove
the non-significant variables sequentially
Stepwise: enter significant variables sequentially; after entering
a variable in the model, check and possibly remove variables that
became non-significant.
Enter variable if P<
A variable is entered into the model if its associated significance
level is less than this P-value.
Remove variable if P>
A variable is removed from the model if its associated significance
level is greater than this P-value.
Classification table cutoff value: a value between 0 and 1 which
will be used as a cutoff value for a classification table. The
classification table is a method to evaluate the logistic
regression model. In this table the observed values for the
dependent outcome and the predicted values (at the selected cut-off
value) are cross-classified.
Categorical: click this button to identify nominal categorical
variables.
Results
After you click the OK button, the following results are displayed: