Question

In: Statistics and Probability

What are the principal aspects of data that need to be examined when using multivariate analysis?

What are the principal aspects of data that need to be examined when using multivariate analysis?

Solutions

Expert Solution

I. OVERVIEW

Multivariate analysis in statistics is devoted to the summarization, representation, and interpretation of data when more than one characteristic of each sample unit is measured. Almost all data-collection processes yield multivariate data. The medical diagnostician examines pulse rate, blood pressure, hemoglobin, temperature, and so forth; the educator observes for individuals such quantities as intelligence scores, quantitative aptitudes, and class grades; the economist may consider at points in time indexes and measures such as percapita personal income, the gross national product, employment, and the Dow-Jones average. Problems using these data are multivariate because inevitably the measures are interrelated and because investigations involve inquiry into the nature of such interrelationships and their uses in prediction, estimation, and methods of classification. Thus, multivariate analysis deals with samples in which for each unit examined there are observations on two or more stochastically related measurements. Most of multivariate analysis deals with estimation, confidence sets, and hypothesis testing for means, variances, covariances, correlation coefficients, and related, more complex population characteristics.

Only a sketch of the history of multivariate analysis is given here. The procedures of multivariate analysis that have been studied most are based on the multivariate normal distribution discussed below.

Robert Adrian considered the bivariate normal distribution early in the nineteenth century, and Francis Galton understood the nature of correlation near the end of that century. Karl Pearson made important contributions to correlation, including multiple correlation, and to regression analysis early in the present century. G. U. Yule and others considered measures of association in contingency tables, and thus began multivariate developments for counted data. The pioneering work of “Student” (W. S. Cosset) on small-sample distributions led to R. A. Fisher’s distributions of simple and multiple correlation coefficients. J. Wishart derived the joint distribution of sample variances and covariances for small multivariate normal samples. Harold Hotelling generalized the Student t-statistic and t-distribution for the multivariate problem. S. S. Wilks provided procedures for additional tests of hypotheses on means, variances, and covariances. Classification problems were given initial consideration by Pearson, Fisher, and P. C. Mahalanobis through measures of racial likeness, generalized distance, and discriminant functions, with some results similar to the work of Hotelling. Both Hotelling and Maurice Bartlett made initial studies of canonical correlations, intercorrelations between two sets of variates. More recent research by S. N. Roy, P. L. Hsu, Meyer Girshick, D. N. Nanda, and others has dealt with the distributions of certain characteristic roots and vectors as they relate to multivariate problems, notably to canonical correlations and multivariate analysis of variance. Much attention has also been given to the reduction of multivariate data and its interpretation through many papers on factor analysis and principal components. [For further discussion of the history of these special areas of multivariate analysis and of their present-day applications, see Counted Data; Distributions, Statistical, article onSpecial Continuous Distributions; Factor analysis; Multivariate Analysis, articles onCorrelationand Classification and Discrimination; Statistics, Descriptive, article on Association; and the biographies ofFisher, R. A.; Galton; Girshick; Gosset; Pearson; Wilks; Yule.]

Basic multivariate distributions :

Scientific progress is made through the development of more and more precise and realistic representations of natural phenomena. Thus, science, and to an increasing extent social science, uses mathematics and mathematical models for improved understanding, such mathematical models being subject to adoption or rejection on the basis of observation [seeModels, Mathematical]. In particular, stochastic models become necessary as the inherent variability in nature becomes understood.

The multivariate normal distribution provides the stochastic model on which the main theory of multivariate analysis is based. The model has sufficient generality to represent adequately many experimental and observational situations while retaining relative simplicity of mathematical structure. The possibility of applying the model to transforms of observations increases its scope [seeStatistical Analysis, Special Problems Of, article onTransformations Of Data]. The large-sample theory of probability and the multivariate central limit theorem add importance to the study of the multivariate normal distribution as it relates to derived distributions. Inquiry and judgment about the use of any model must be the responsibility of the investigator, perhaps in consultation with a statistician. There is still a great deal to be learned about the sensitivity of the multivariate model to departures from that distributional assumption. [SeeErrors, article on Effects Of Errors In Statistical Assumptions.]

The multivariate normal distribution

Suppose that the characteristics or variates to be measured on each element of a sample from a population, conceptual or real, obey the probability law described through the multivariate normal probability density function. If these variates are p in number and are designated by X1, … Xp, the multivariate normal density contains p parameters, or population characteristics, σ1, …, σP , representing, respectively, the means or expected values of the variates, and parameters σ ij i, j = 1, …, p, σji σ ij, representing variances and covariances of the variates. Here σ ii is the variance of Xi(corresponding to the variance σ2 of a variate X in the univariate case) and σij = σij is the covariance of Xi and Xj. The correlation coefficient between Xi and Xi is

The multivariate normal probability density function provides the probability density for the variates Xi, … … …, Xp at each pointx1, … … …, xp in the sample or observation space. Its specific mathematical form is

− ∞ < xi < ∞, i = 1,..., p [For the explicit form of this density in the hivariate case (p = 2), seeMultivariate Analysis, article on Correlation(1).]

(Vector and matrix notation and an understanding of elementary aspects of matrix algebra are important for any real understanding or application of multivariate analysis. Thus, x′ is the vector (xi...xp), μ′ is the vector (μ1 ...,μp), and (x – μ)′ is the vector (x1–μ1..., xp – μp Also, Σ is the p × p, symmetric matrix which has elements σij, Σ = [σij], ǀΣǀ is the determinant of Σ and Σ-1 is its inverse. The prime indicates “transpose,” and thus (x — μ)− is the transpose of (x — μ), a column vector.)

Comparison off(x1..., xp with f (x) the univariate normal probability density function, may assist understanding; for a univariate normal variate X with mean μ and variance σ2,

Where − ∞< x < ∞

The multivariate normal density may be characterized in various ways. One direct method begins with p independent, univariate normal variables, U1, ..., Up each with zero mean and unit variance. From the independence assumption, their joint density is the product

A very special case of the multivariate normal probability density function. If variates Xl, ... Xp are linearly related to Ul, ..., Up so that X=AU + μ in matrix notation, with X , U , and μ being column vectors and A being a p × p nonsingular matrix of constants aij, then

Xi = aij + ... + aip Up, i = 1,..., p.

Clearly, the mean of Xi is E(Xi) = μi, where μi is a known constant and E represents “expectation.” The variance of Xi is

And the covariance of Xi and Xj, ij, is

Standard density function manipulations then yield the joint density function of Xl, ..., Xp as that already given as the general p-variate normal density. If the matrix A is singular, the results for E(Xi), var(Xi), and cov(Xi, Xj) still hold and Xi, ..., Xp are said to have a singular multivariate normal distribution; although the joint density function cannot be written, the concept is useful.

A second characterization of the p-variate normal distribution is the following: Xl ..., Xp have a pvariate normal distribution if and only if is univariate normal for all choices of the coefficients ai, that is, if and only if all linear combinations of the Xi are univariate normal.

The multivariate normal cumulative distribution function represents the probability of the joint occurrence of the events X1x1,Xpx and may be written

Indicating that probabilities that observations fall into regions of the p-dimensional variate space may be obtained by integration. Tables of F(x1,..., xp) are available for p = 2, 3 (see Greenwood & Hartley 1962).


Related Solutions

Multivariate analysis Using the data provided, perform the following analysis: Determine the explanatory and response variables....
Multivariate analysis Using the data provided, perform the following analysis: Determine the explanatory and response variables. Run a multivariate regression analysis on all three variables. Interpret the results by answering the following questions: What is the calculated correlation coefficient? Do the sales figures correlate with the marketing expenditure and price? Comment on the coefficient of determination. What percentage of the response data can be explained by the explanatory variables? Determine the multiple regression line equation in the form: sales^ =...
What are two benefits of using Principal Components Analysis (PCA)?
What are two benefits of using Principal Components Analysis (PCA)?
In this submission, reflect on what aspects need to be considered when credentialing a physician for...
In this submission, reflect on what aspects need to be considered when credentialing a physician for a medical facility. Credentialing is more than the education and background check. HInt: Many times payment is made by insurance! Discuss all the credentials that must be maintained for the physician and why. Reflect on the challenges that could be faced for the facility, the employees, and the clients if this process is not completed in a timely fashion.
Give an example to explain how factor analysis can be useful in Multivariate data?
Give an example to explain how factor analysis can be useful in Multivariate data?
Discuss two approaches to using multiple regression when the assumption of multivariate normality is violated.
Discuss two approaches to using multiple regression when the assumption of multivariate normality is violated.
Are there any limitations to using PCA?Principal Components Analysis (PCA)
Are there any limitations to using PCA?Principal Components Analysis (PCA)
When estimating a multivariate model using OLS. Discuss possible problems, such as multicollinearity, heteroscedasticity and simultaneous...
When estimating a multivariate model using OLS. Discuss possible problems, such as multicollinearity, heteroscedasticity and simultaneous equation bias.
Apply PCA ( Principal Component Analysis ) in python to this data set below  that is a...
Apply PCA ( Principal Component Analysis ) in python to this data set below  that is a csv file Then plot it with different colors. Thank you I will UPVOTE! target A B C D E F G surprise 2 3 1 1 19 12 0 sad 2 0 0 2 12 1 15 angry 95 2 1 0 1 0 1 sad 4 56 2 0 0 3 1 neutral 1 2 2 0 39 0 11 happy 0 0...
Apply PCA ( Principal Component Analysis ) in python to this data set below  that is a...
Apply PCA ( Principal Component Analysis ) in python to this data set below  that is a csv file Then plot it. Thank you I will UPVOTE! A B C D E F G 2 3 1 1 19 12 0 2 0 0 2 12 1 15 95 2 1 0 1 0 1 4 56 2 0 0 3 1 1 2 2 0 39 0 11 0 0 0 34 1 0 0 5 55 0 0 0...
What is the main advantages of utilizing a single multivariate analysis over multiple bivariate analyses?
What is the main advantages of utilizing a single multivariate analysis over multiple bivariate analyses?
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT