Question

In: Statistics and Probability

Describe the circumstances in which you would undertake a principle components analysis (PCA). What checks you...

Describe the circumstances in which you would undertake a principle components analysis (PCA). What checks you would undertake to assess the strength of the relationship among the variables before conducting a PCA?

Solutions

Expert Solution

Introduction

Principal components analysis (PCA, for short) is a variable-reduction technique that shares many similarities to exploratory factor analysis. Its aim is to reduce a larger set of variables into a smaller set of 'articifial' variables, called 'principal components', which account for most of the variance in the original variables.

There are a number of common uses for PCA: (a) you have measured many variables (e.g., 7-8 variables, represented as 7-8 questions/statements in a questionnaire) and you believe that some of the variables are measuring the same underlying construct (e.g., depression). If these variables are highly correlated, you might want to include only those variables in your measurement scale (e.g., your questionnaire) that you feel most closely represent the construct, removing the others; (b) you want to create a new measurement scale (e.g., a questionnaire), but are unsure whether all the variables you have included measure the construct you are interested in (e.g., depression). Therefore, you test whether the construct you are measuring 'loads' onto all (or just some) of your variables. This helps you understand whether some of the variables you have chosen are not sufficiently representative of the construct you are interested in, and should be removed from your new measurement scale; (c) you want to test whether an existing measurement scale (e.g., a questionnaire) can be shortened to include fewer items (e.g., questions/statements), perhaps because such items may be superfluous (i.e., more than one item may be measuring the same construct) and/or there may be the desire to create a measurement scale that is more likely to be completed (i.e., response rates tend to be higher in shorter questionnaires). These are just some of the common uses of PCA. It is also worth noting that whilst PCA is conceptually different to factor analysis, in practice it is often used interchangably with factor analysis, and is included within the 'Factor procedure' in SPSS Statistics.

In this "quick start" guide, we show you how to carry out PCA using SPSS Statistics, as well as the steps you'll need to go through to interpret the results from this test. However, before we introduce you to this procedure, you need to understand the different assumptions that your data must meet in order for PCA to give you a valid result. We discuss these assumptions next.

Assumptions

When you choose to analyse your data using PCA, part of the process involves checking to make sure that the data you want to analyse can actually be analysed using PCA. You need to do this because it is only appropriate to use PCA if your data "passes" four assumptions that are required for PCA to give you a valid result. In practice, checking for these assumptions requires you to use SPSS Statistics to carry out a few more tests, as well as think a little bit more about your data, but it is not a difficult task.

Before we introduce you to these four assumptions, do not be surprised if, when analysing your own data using SPSS Statistics, one or more of these assumptions is violated (i.e., not met). This is not uncommon when working with real-world data rather than textbook examples. However, even when your data fails certain assumptions, there is often a solution to try and overcome this. First, let’s take a look at these four assumptions:

  • Assumption #1: You have multiple variables that should be measured at the continuous level (although ordinal variables are very frequently used). Examples of continuous variables (i.e., ratio or interval variables) include revision time (measured in hours), intelligence (measured using IQ score), exam performance (measured from 0 to 100), weight (measured in kg), and so forth. Examples of ordinal variables commomly used in PCA include a wide range o f Likert scales (e.g., a 7-point scale from 'strongly agree' through to 'strongly disagree'; a 5-point scale from 'never' to 'always'; a 7-point scale from 'not at all' to 'very much'; a 5-point scale from 'not important' to 'extremely important').
  • Assumption #2: There needs to be a linear relationship between all variables. The reason for this assumption is that a PCA is based on Pearson correlation coefficients, and as such, there needs to be a linear relationship between the variables. In practice, this assumption is somewhat relaxed (even if it shouldn't be) with the use of ordinal data for variables. Although linearity can be tested using a matrix scatterplot, this is often considered overkill because the scatterplot can sometimes have over 500 linear relationships. As such, it is suggested that you randomly select just a few possible relationships between variables and test these. You can check for linearity in SPSS Statistics using scatterplots, and where there are non-linear relationships, try and "transform" these. If you choose to upgrade to our enhanced content, we have SPSS Statistics guides that show you how to test for linearity using SPSS Statistics, as well as how to carry out transformations when this assumption is violated. You can learn more about our enhanced content here.
  • Assumption #3: You should have sampling adequacy, which simply means that for PCA to produce a reliable result, large enough sample sizes are required. Many different rules-of-thumb have been proposed. These mainly differ depending on whether an absolute sample size is proposed or if a multiple of the number of variables in your sample are used. Generally speaking, a mimimum of 150 cases, or 5 to 10 cases per variable, has been recommended as a minimum sample size. There are a few methods to detect sampling adequacy: (1) the Kaiser-Meyer-Olkin (KMO) Measure of Sampling Adequacy for the overall data set; and (2) the KMO measure for each individual variable. In the SPSS Statistics procedure later in this guide, we show you which options to select in SPSS Statistics to test for sampling adequacy. If you are unsure how to interpret the results from these tests, we show you in our enhanced PCA guide, which is part of our enhanced content (again, you can learn more about our enhanced content here).
  • Assumption #4: Your data should be suitable for data reduction. Effectively, you need to have adequate correlations between the variables in order for variables to be reduced to a smaller number of components. The method used by SPSS Statistics to detect this is Bartlett's test of sphericity. Interpretation of this test is provided as part of our enhanced PCA guide.
  • Assumption #5: There should be no significant outliers. Outliers are important because these can have a disproportionate influence on your results. SPSS Statistics recommends determining outliers as component scores greater than 3 standard deviations away from the mean. Again, in the SPSS Statistics procedure later in this guide, we show you which options to select in SPSS Statistics to check for outliers. If you are unsure how to interpret the SPSS Statistics output that you need to inspect to check for outliers, we show you in our enhanced PCA guide.

You can check assumptions #2, #3, #4 and #5 using SPSS Statistics. Just remember that if you do not run the statistical tests on these assumptions correctly, the results you get when running PCA might not be valid. This is why we dedicate number of articles in our enhanced guides to help you get this right. You can find out about our enhanced content as a whole here, or more specifically, learn how we help with testing assumptions here.

In the section, Procedure, we illustrate the SPSS Statistics procedure that you can use to carry out PCA on your data. First, we introduce the example that is used in this guide.


Related Solutions

Are there any limitations to using PCA?Principal Components Analysis (PCA)
Are there any limitations to using PCA?Principal Components Analysis (PCA)
What are two benefits of using Principal Components Analysis (PCA)?
What are two benefits of using Principal Components Analysis (PCA)?
Describe the components of a HPLC. How would you improve the accuracyof the analysis, if necessary?
Describe the components of a HPLC. How would you improve the accuracyof the analysis, if necessary?
Describe three principle components involved in the Western EIM.
Describe three principle components involved in the Western EIM.
under what circumstances would you choose to join a union? if you would not join a...
under what circumstances would you choose to join a union? if you would not join a union under any circumstances-- it is an acceptable position as long as you can support it. this is for Trends and Issues in Nursing.
1. Compare and contrast the methods of Principle Components Analysis vs. admixture analysis (i.e., the ADMIXTURE...
1. Compare and contrast the methods of Principle Components Analysis vs. admixture analysis (i.e., the ADMIXTURE computer program) in the analysis of modern and ancient human DNAs. Describe the mathematical steps used to perform each of these methods, and what we can learn from the results. Why do many authors use both methods? What assumptions or limitations (if any) are required by each of these methods?
What are the key components of financial statement analysis? If you were an investor, which ratio...
What are the key components of financial statement analysis? If you were an investor, which ratio would you focus upon? Explain.
Would there be any circumstances under which this phenomenon would not occur?
Why is it that, in the short-run, after a certain number of workers has been hired, output increases by less and less with each additional worker hired? Illustrate your answer with an example.Would there be any circumstances under which this phenomenon would not occur?
What are different methods by which you can compute the PCA? Does every method will yield...
What are different methods by which you can compute the PCA? Does every method will yield same result or answer at the end?
Are there any circumstances in which you would allow yourself to be cloned? Why or why...
Are there any circumstances in which you would allow yourself to be cloned? Why or why not
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT