In: Civil Engineering
a) What is the primary difference between the dependent variable of an OLS regression model and a logit model?
b) What is the primary difference between the model results of an OLS regression model and a logit model?
Answer:
a)
Definition of Linear Regression
The linear regression technique involves the continuous dependent variable and the independent variables can be continuous or discrete. By using best fit straight line linear regression sets up a relationship between dependent variable (Y) and one or more independent variables (X). In other words, there exist a linear relationship between independent and dependent variables.
The difference between linear and multiple linear regression is that the linear regression contains only one independent variable while multiple regression contains more than one independent variables. The best fit line in linear regression is obtained through least square method.
The following equation is used to represent a linear regression model:Where b0 is the intercept, b1is the slope of the line and e is the error. Here Y is dependent variable and X is an independent variable.
.
The following graph can be used to show the linear regression model.
Definition of Logistic Regression
The logistic regression technique involves dependent variable which can be represented in the binary (0 or 1, true or false, yes or no) values, means that the outcome could only be in either one form of two. For example, it can be utilized when we need to find the probability of successful or fail event. Here, the same formula is used with the additional sigmoid function, and the value of Y ranges from 0 to 1.
Logistic regression equation :
The following graph can be used to show the logistic regression model.
Comparison Chart
BASIS FOR COMPARISON | LINEAR REGRESSION | LOGISTIC REGRESSION |
---|---|---|
Basic | The data is modelled using a straight line. | The probability of some obtained event is represented as a linear function of a combination of predictor variables. |
Linear relationship between dependent and independent variables | Is required | Not required |
The independent variable | Could be correlated with each other. (Specially in multiple linear regression) | Should not be correlated with each other (no multicollinearity exist). |
Difference Between Linear and Logistic Regression
Linear and Logistic regression are the most basic form of regression which are commonly used. The essential difference between these two is that Logistic regression is used when the dependent variable is binary in nature. In contrast, Linear regression is used when the dependent variable is continuous and nature of the regression line is linear.Regression is a technique used to predict the value of a response (dependent) variables, from one or more predictor (independent) variables, where the variable are numeric. There are various forms of regression such as linear, multiple, logistic, polynomial, non-parametric, etc.
Key Differences Between Linear and Logistic Regression
b) Difference between linear regression and logistic regression
Linear regression uses the general linear equation Y=b0+∑(biXi)+ϵY=b0+∑(biXi)+ϵ where YY is a continuous dependent variable and independent variables Xi are usually continuous (but can also be binary, e.g. when the linear model is used in a t-test) or other discrete domains. ϵϵ is a term for the variance that is not explained by the model and is usually just called "error". Individual dependent values denoted by Yj can be solved by modifying the equation a little: Yj=b0+∑(biXij)+ϵj
Logistic regression is another generalized linear model (GLM) procedure using the same basic formula, but instead of the continuous YY, it is regressing for the probability of a categorical outcome. In simplest form, this means that we're considering just one outcome variable and two states of that variable- either 0 or 1.
The equation for the probability of Y=1Y=1 looks like this:
P(Y=1)=11+e−(b0+∑(biXi))
Your independent variables XiXi can be continuous or binary. The regression coefficients bibi can be exponentiated to give you the change in odds of YY per change in XiXi, i.e., Odds=P(Y=1)P(Y=0)=P(Y=1)1−P(Y=1)Odds=P(Y=1)P(Y=0)=P(Y=1)1−P(Y=1) and ΔOdds=ebi. ΔOdds is called the odds ratio, Odds(Xi+1)Odds(Xi)Odds(Xi+1)Odds(Xi). In English, you can say that the odds of Y=1 increase by a factor of per unit change in Xi.
Example: If you wanted to see how body mass index predicts blood cholesterol (a continuous measure), you'd use linear regression as described at the top of my answer. If you wanted to see how BMI predicts the odds of being a diabetic (a binary diagnosis), you'd use logistic regression