In: Computer Science
Are there any limitations to using PCA?Principal Components Analysis (PCA)
PCA is focused on finding orthogonal projections of the dataset that contains the highest variance possible in order to 'find hidden LINEAR correlations' between variables of the dataset. This means that if you have some of the variables in your dataset that are linearly correlated, PCA can find directions that represents your data. Imagine two variables that represents the size of something in cm and inch respectively (the values of those variables are correlated by the formula 2.54 cm = 1 inch), if you add noise and plot the data you will get something.
but if the data is not linearly correlated (i.e. in spiral, where
x=t*cos(t) and y =t*sin(t) ), PCA is not enough.
Sometimes consider that principal components are orthogonal to the others it's a restriction to find projections with the highest variance:
This assumption depends of what problem do you want to
solve:
- If you want to compress or remove noise from your dataset this
assumption is an advantage
- For mostly any other problem (like Blind Source Separation) it is
not useful. Based on Independent Component Analysis theory:
uncorrelated is only partly independent.
PCA, as you could've seen, is a rotation trasnformation of your dataset, wich means that doens't affect the scale of your data. It's worth to said also that in PCA you dont normalize your data. That means that if you change the scale of just some of the variables in your
data set, you will get different results by applying PCA .
There are many statistics distributions in which mean and covariance doesn't give relevant information of them. In fact, mean and covariance are used (or could be considered important) for Gaussians