In: Computer Science
How logistic regression maps all outcome to either 0 or 1. The
equation for log-likelihood
function (LLF) is :
LLF = Σi( i log( ( i)) + (1 − i) log(1 − ( i))). y p x y p x
How logistic regression uses this in maximum likelihood
estimation?
The model of the logistic regression is given by:
x - feature vector, w - weight, b - bias.
In this our goal is to find the optimal set of values for b and w.
Here wx+b is the output which is provided by the linear regression. It is passed as input in the sigmoid function, which ranges the output from + infinity to - infinity in between 0 and 1.
The output of logistic regression is considered as the probability (interpreted). It is clear that various estimations of w and b compare to various models of logistic regression. Yet, we need an ideal arrangement of qualities, w* and b* which will limit the error between the expectations made by the model f(x) and the genuine outcomes y for the training set.
The cost function for the logistic regression is called the log-likelihood function.
As given in the question:
The equation for log-likelihood function (LLF) is :
LLF = Σ yi( yi log( f(xi)) + (1 −
yi) log(1 − f(xi))). y p x y p x
The objective is to discover estimations of w and b for the model f(x) which will minimise the above cost work. On watching the cost function we notice that there are two terms inside summation where, the initial term will be 0 for all the models where y=0, and the subsequent term will be zero for all the models where y=1. In this way, for some random model one of the summation terms is consistently zero. Additionally, the scope of f(x) is [0,1], which suggests that - log(f(x)) ranges from (+∞,0].
The parameters of a calculated logistic model can be assessed by the probabilistic structure called maximum likelihood estimation. The maximum likelihood approach to deal with fitting a logistic model, guides in better understanding the type of the strategic logistic regression model and gives a layout that can be utilized for fitting grouping models all the more by and large.
In Maximum Likelihood Estimation, we wish to amplify the likelihood of watching the information from the joint likelihood appropriation given a particular likelihood conveyance and its boundaries, expressed officially as:
P(X | theta)
This restrictive likelihood is frequently expressed utilizing the semicolon (;) documentation rather than the bar documentation (|) on the grounds that theta is anything but an irregular variable, yet rather an obscure boundary. For instance:
P(X ; theta)
or on the other hand
P(x1, x2, x3, … , xn ; theta)
This subsequent contingent likelihood is alluded to as the likelihood of watching the information given the model boundaries and composed utilizing the documentation L() to mean the likelihood work. For instance:
L(X ; theta)
The goal of Maximum Likelihood Estimation is to locate the arrangement of boundaries (theta) that augment the likelihood work, for example bring about the biggest likelihood esteem.
augment L(X ; theta)
We can unload the restrictive likelihood determined by the likelihood work.
Given that the example is included n models, we can outline this as the joint likelihood of the watched information tests x1, x2, x3, … , xn in X given the likelihood dispersion boundaries (theta).
L(x1, x2, x3, … , xn ; theta)
The joint likelihood dispersion can be rehashed as the augmentation of the contingent likelihood for watching every model given the appropriation boundaries.
item I to n P(xi ; theta)
Duplicating numerous little probabilities together can be mathematically flimsy practically speaking, accordingly, it is entirely expected to repeat this issue as the whole of the log contingent probabilities of watching every model given the model boundaries.
entirety I to n log(P(xi ; theta))
Where log with base-e called the normal logarithm is usually utilized