In: Statistics and Probability
Matlab Code
Write a procedure to calculate the log discriminant function for a given multi-variate Gaussian distribution and prior probability
solution:
Normal distribution and discriminant functions
Linear discriminant functions have a variety of pleasant analytical properties. They can be optimal if the underlying distributions are cooperative, such as Gaussians having equal covariance, as might be obtained through an intelligent choice of feature detectors. Even when they are not optimal, we might be willing to sacrifice some performance in order to gain the advantage of their simplicity. Linear discriminant functions are relatively easy to compute and in the absence of information suggesting otherwise, linear classifiers are attractive candidates for initial, trial classifiers.
Discriminant analysis is a classification method. It assumes that different classes generate data based on different Gaussian distributions.
To train (create) a classifier, the fitting function estimates the parameters of a Gaussian distribution for each class
To predict the classes of new data, the trained classifier finds the class with the smallest misclassification cost
in a problem with feature vector y and state of nature variable w, we can represent the discriminant function as:
gi(x)=-1/2 (x−μi)t Σi-1 (x−μi) −d/2 ln 2π−1/2 ln |Σi| + lnP(wi)
case 1: Σi = σ2I
This is the simplest case and it occurs when the features are statistically independent and each feature has the same variance, σ2. Here, the covariance matrix is diagonal since its simply σ2 times the identity matrix I. This means that each sample falls into equal sized clusters that are centered about their respective mean vectors. The computation of the determinant and the inverse |Σi| = σ2d and Σi-1 = (1/σ2)I. Because both |Σi| and the (d/2) ln 2π term in the equation above are independent of i, we can ignore them and thus we obtain this simplified discriminant function:
that is
gi (x)=−||x−μi||2 /2σ2 + ln P(wi)
||x−μi|| 2 =(x−μi)t (x−μi)
If the prior probabilities are not equal, then the discriminant function shows that the squared distance ||x - μ||2 must be normalized by the variance σ2 and offset by adding ln P(wi); therefore if x is equally near two different mean vectors, the optimal decision will favor the priori more likely. Expansion of the quadratic form (x - μi)t(x - μi) yields :
gi(x)=−1/2 σ 2 [xtx − 2μitx+ μitμi] + ln P(wi)
which looks like a quadratic function of x. However, the quadratic term xtx is the same for all i, meaning it can be ignored since it just an additive constant, thereby we obtain the equivalent discriminant function:
where
gi(x)=wit+wi0
wi=1σ2μi
and
wi0= −1/2σ2μitiμi ln P(wi)
wi0 is the threshold or bias for the ith category.
A classifier that uses linear discriminants is called a linear machine. For a linear machine, the decision surfaces for a linear machine are just pieces of hyperplanes defined by the linear equations gi(x) = gj(x) for the two categories with the highest posterior probabilities. In this situation, the equation can be written as
where
wt ( x−x0 ) = 0
w=μi−μj
x0=1/2(μi+μj)−σ2/ ||μi−μj|| 2 lnP (wi)/P(wj) (μi−μj)
The Multivariate Gaussian Distribution
A vector-valued random variable X=[X1···Xn]T is said to have a multivariate normal (or Gaussian) distribution with mean μ∈Rn and covariance matrix Σ∈Sn++ 1
if its probability density function 2 is given by
p(x;μ,Σ) =1/(2n/2)|Σ|1/2 exp (-1/2(x-)TΣ-1x-)
We write this asX∼N(μ,Σ). In these notes, we describe multivariate Gaussians and some of their basic properties.
The model for discriminant analysis is
Each class (Y) generates data (X) using a multivariate normal distribution. In other words, the model assumes X has a Gaussian mixture distribution (gmdistribution).
For linear discriminant analysis, the model has the same covariance matrix for each class; only the means vary.
For quadratic discriminant analysis, both means and covariances of each class vary.
For linear discriminant analysis, it computes the sample mean of each class. Then it computes the sample covariance by first subtracting the sample mean of each class from the observations of that class, and taking the empirical covariance matrix of the result.
For quadratic discriminant analysis, it computes the sample mean of each class. Then it computes the sample covariances by first subtracting the sample mean of each class from the observations of that class, and taking the empirical covariance matrix of each class.
The fit method does not use prior probabilities or costs for fitting.
please give me thumb up