In: Statistics and Probability
Briefly explain when to use the Canonical Correspondence analysis (CCA) ordination technique and why can this technique be regarded as an environment on the other ordination techniques
Canonical correspondence analysis (CCA) is a multivariate method to elucidate the relationships between biological assemblages of species and their environment. The method is designed to extract synthetic environmental gradients from ecological data-sets. The gradients are the basis for succinctly describing and visualizing the differential habitat preferences (niches) of taxavia an ordination diagram. Linear multivariate methods for relating two set of variables, such as two-block Partial Least Squares (PLS2), canonical correlation analysis and redundancy analysis, are less suited for this purpose because habitat preferences are often unimodal functions of habitat variables. After pointing out the key assumptions underlying CCA, the paper focuses on the interpretation of CCA ordination diagrams. Subsequently, some advanced uses, such as ranking environmental variables in importance and the statistical testing of effects are illustrated on a typical macroinvertebrate data-set. The paper closes with comparisons with correspondence analysis, discriminant analysis, PLS2 and co-inertia analysis. In an appendix a new method, named CCA-PLS, is proposed that combines the strong features of CCA and PLS2.
CCA is currently one of the most popular ordination techniques in community ecology. It is, however, one of the most dangerous in the hands of people who do not take the time to understand this relatively complex method. The dangers lie in several areas:
(1) Because it includes multiple regression of community gradients on environmental variables, it is subject to all of the hazards of multiple regression. Multicollinearity is a particular problem and it is easy to believe that a relatively high coefficient of multiple correlation implies a highly significant result which it may not. Further, it must be remembered that the method uses linear regression, it is quite likely that the response of the community to changes in an environmental variable may not be linear.
(2) As the number of environmental variables increases relative to the number of observations, the results become increasingly dubious as the appearance of very strong relationships becomes inevitable.
(3) Statistics indicating the "percentage of variance explained" can be calculated in several ways, each for a different question, but users frequently confuse these statistics when reporting their results.
please rate my answer and comment for doubts.