Question

In: Statistics and Probability

10.2 Suppose we a data set where each data point represents a single student's scores on...

10.2

Suppose we a data set where each data point represents a single student's scores on a math test, a physics test, a reading comprehension test, and a vocabulary test.

We find the first two principal components, which capture 90% of the variability in the data, and interpret their loadings. We conclude that the first principal component represents overall academic ability, and the second represents a contrast between quantitative ability and verbal ability.

What loadings would be consistent with that interpretation? Choose all that apply.

a) (0.5, 0.5, 0.5, 0.5) and (0.71, 0.71, 0, 0)

b) (0.5, 0.5, 0.5, 0.5) and (0, 0, -0.71, -0.71)

c) (0.5, 0.5, 0.5, 0.5) and (0.5, 0.5, -0.5, -0.5)

d) (0.5, 0.5, 0.5, 0.5) and (-0.5, -0.5, 0.5, 0.5)

e) (0.71, 0.71, 0, 0) and (0, 0, 0.71, 0.71)

f) (0.71, 0, -0.71, 0) and (0, 0.71, 0, -0.71)

Expert Solution

Loadings (which should not be confused with eigenvectors) have the following properties:

Their sums of squares within each component are the eigenvalues (components' variances).
Loadings are coefficients in linear combination predicting a variable by the (standardized) components.

You extracted 2 first PCs out of 4. Matrix of loadings A

and the eigenvalues:

A (loadings)
         PC1           PC2
X1   .5000000000   .5000000000 
X2   .5000000000   .5000000000 
X3   .5000000000  -.5000000000 
X4   .5000000000  -.5000000000
Eigenvalues:
    1.0000000000  1.0000000000

In this instance, both eigenvalues are equal. It is a rare case in real world, it says that PC1 and PC2 are of equal explanatory "strength".

Suppose you also computed component values, Nx2 matrix C

, and you z-standardized (mean=0, st. dev.=1) them within each column. Then (as point 2 above says), X^=CA′. But, because you left only 2 PCs out of 4 (you lack 2 more columns in A) the restored data values X^

are not exact, - there is an error (if eigenvalues 3, 4 are not zero).

OK. What are the coefficients to predict components by variables? Clearly, if A

were full 4x4, these would be B=(A−1)′. With non-square loading matrix, we may compute them as B=A⋅diag(eigenvalues)−1=(A+)′

, where diag(eigenvalues) is the square diagonal matrix with the eigenvalues on its diagonal, and + superscript denotes pseudoinverse. In your case:

diag(eigenvalues):
1 0
0 1

B (coefficients to predict components by original variables):
    PC1           PC2
X1 .5000000000   .5000000000 
X2 .5000000000   .5000000000 
X3 .5000000000  -.5000000000 
X4 .5000000000  -.5000000000

So, if X

is Nx4 matrix of original centered variables (or standardized variables, if you are doing PCA based on correlations rather than covariances), then C=XB; C

are standardized principal component scores. Which in your example is:

PC1 = 0.5*X1 + 0.5*X2 + 0.5*X3 + 0.5*X4 ~ (X1+X2+X3+X4)/4

"the first component is proportional to the average score"

PC2 = 0.5*X1 + 0.5*X2 - 0.5*X3 - 0.5*X4 = (0.5*X1 + 0.5*X2) - (0.5*X3 + 0.5*X4)

"the second component measures the difference between the first pair of scores and the second pair of scores"

In this example it appeared that B=A

, but in general case they are different.

Note: The above formula for the coefficients to compute component scores, B=A⋅diag(eigenvalues)−1

, is equivalent to B=R−1A, with R being the covariance (or correlation) matrix of variables. The latter formula comes directly from linear regression theory. The two formulas are equivalent within PCA context only. In factor analysis, they are not and to compute factor scores (which are always approximate in FA) one should rely on the second formula.

orchestra answered 3 years ago

The following data set represents the test scores of the freshmen on the first in a...

The following data set represents the test scores of the freshmen on the first in a statistics course at a local university. 62 67 74 48 100 93 49 57 77 63 82 10 78 88 99 44 51 80 71 39 58 76 89 94 70 41 66 82 18 73 a. Calculate the z-score for the observation 63 and interpret it. b. Find the median of the data set. If the z-score for an observation is -1.22, the...

To compute a student's Grade Point Average (GPA) for a term, the student's grades for each...

To compute a student's Grade Point Average (GPA) for a term, the student's grades for each course are weighted by the number of credits for the course. Suppose a student had these grades: 3.8 in a 5 credit Math course 1.9 in a 3 credit Music course 2.8 in a 5 credit Chemistry course 3.4 in a 5 credit Journalism course What is the student's GPA for that term? Round to two decimal places. Student's GPA =

Each of the following three data sets represents the IQ scores of a random sample of...

Each of the following three data sets represents the IQ scores of a random sample of adults. IQ scores are known to have a mean and median of 100. For each data set, determine the sample standard deviation. Then recompute the sample standard deviation assuming that the individual whose IQ is 108 is accidentally recorded as 180. For each sample size, state what happens to the standard deviation. Comment on the role that the number of observations plays in resistance....

You have a revenue data set of a Metro Area, where a node represents an oﬃce,...

You have a revenue data set of a Metro Area, where a node represents an oﬃce, the value associated with every node is the annual revenue of this oﬃce. The data set includes 2000 oﬃces in that Metro Area. Now, I would like to partition the Philadelphia Metro Area to 10 business sub-areas. Each of these 10 business sub-areas has one or more oﬃces. The total anual revenue from each sub-area should equal or almost equals to others. How to...

a) Prove that an isolated point of set A is a boundary point of A (where...

a) Prove that an isolated point of set A is a boundary point of A (where A is a subset of real numbers). b) Prove that a set is closed if and only if it contains all its boundary points

When calculating variance, we square the difference between each data point and the mean in a set of numbers. Why do we do this?

When calculating variance, we square the difference between each data point and the mean in a set of numbers. Why do we do this?A. The deviation scores will sum to zero and cancel each other out if they are not squared first.B. We need to even out the numbers to make them easier to handle.C.Squaring the difference between a data point and the mean is the way to calculate deviance.D. Larger numbers are easier to calculate than smaller numbers.

Point A on the graph represents the equality of capital expenditures and income. point where consumption...

Point A on the graph represents the equality of capital expenditures and income. point where consumption equals income. competitive equilibrium. point where the marginal propensity to consume equals 0. (13) According to the aggregate demand/aggregate supply curve, when prices rise, businesses will cut back spending. businesses will increase spending by the amount of the price increase. businesses will increase spending by an amount less than the price increase. business spending is unaffected. (14) ___________...

Suppose we are testing a data set for normality. The data has a median of 80.25,...

Suppose we are testing a data set for normality. The data has a median of 80.25, a mean of 77.5, and a standard deviation of 2.1. Compute Pearson's Coefficient of Skewness for this data. PC = Suppose that the histogram looks roughly bell-shaped and there are no outliers. Can we say that this data is normally distributed? A. Yes B. No Round your Pearson's Coefficient to at least three decimal places if applicable. A laser printer is advertised to print...

Is the mean (in a data set where each number is an error) more representatieve if...

Is the mean (in a data set where each number is an error) more representatieve if you take the absolute values or the negative and positive values. I feel that if I take the negative and positive values the mean could be 0 even if error in both directions is very high.

Generate a simulated data set with 100 observations based on the following model. Each data point...

Generate a simulated data set with 100 observations based on the following model. Each data point is a vector Z= (X, Y) where X describes the age of a machine New, FiveYearsOld, and TenYearsOld and Y describes whether the quality of output from the machine Normal or Abnormal. The probabilities of a machine being in the three states are P(X = New) = 1/4 P(X = FiveYearsOld) = 1/3 P(X = TenYearsOld) = 5/12 The probabilities of Normal output conditioned...