In: Math
Compare the kk-NN classifier, linear discriminant analysis (LDA) and the logistic model when it comes to classification. Which is generally better?
Answer:
In the LDA framework, we can see that the log odds is given by
where c0 and c1 are functions of µ1, µ2 and σ 2 .
By formulation of logistic regression, we can see that,
Both are linear functions of x. Hence, both logistic regression and LDA produce linear decision boundaries. The only difference between the two approaches lies in the fact that β0 and β1 are estimated using maximum likelihood, whereas c0 and c1 are computed using the estimated mean and variance from a normal distribution.
( Since logistic regression and LDA differ only in their fitting procedures, one might expect the two approaches to give similar results. This is often, but not always, the case. LDA assumes that the observations are drawn from a Gaussian distribution with a common covariance matrix in each class, and so can provide some improvements over logistic regression when this assumption approximately holds. Conversely, logistic regression can outperform LDA if these Gaussian assumptions are not met.)
KNN
KNN takes a completely different approach from those classifiers. In order to make a prediction for an observation X = x, the K training observations that are closest to x are identified. Then X is assigned to the class to which the plurality of these observations belong. Hence KNN is a completely non-parametric approach: no assumptions are made about the shape of the decision boundary. Therefore, we can expect this approach to dominate LDA and logistic regression when the decision boundary is highly non-linear. On the other hand, KNN does not tell us which predictors are important; we don't get a table of coefficients.
In short which classifier is better is depend on situation.
For parametric gaussian data LDA is better. For parametric non-gaussian data Logistic regression is better. And for non-parametric data knn is better.