In: Statistics and Probability
The eigenvalue is a measure of how much of the variance of the observed variables a factor explains. Any factor with an eigenvalue ≥1 explains more variance than a single observed variable, so if the factor for socioeconomic status had an eigenvalue of 2.3 it would explain as much variance as 2.3 of the three variables. This factor, which captures most of the variance in those three variables, could then be used in another analysis. The factors that explain the least amount of variance are generally discarded. How do we determine how many factors are useful to retain?
Dropping unimportant variables from your analysis
Once you run a factor analysis and think you have some usable results, it’s time to eliminate variables that are not “strong” enough. They are usually the ones with low factor loadings, {factor loadings are explained in last} although additional criteria should be considered before taking out a variable.
As a rule of thumb, your variable should have a rotated factor loading of at least |0.4| (meaning ≥ +.4 or ≤ –.4) onto one of the factors in order to be considered important.
Some researchers use much more stringent criteria such as a cut-off of |0.7|. In some instances, this may not be realistic: for example, when the highest loading a researcher finds in her analysis is |0.5|.
Other researchers relax the criteria to the point where they include variables with factor loadings of |0.2|.
Which cut-offs to use depends on whether you are running a confirmatory or exploratory factor analysis, and on what is usually considered an acceptable cut-off in your field. In addition, a variable should ideally only load cleanly onto one factor.
Factor Loading
The relationship of each variable to the underlying factor is expressed by the so-called factor loading. Here is an example of the output of a simple factor analysis looking at indicators of wealth, with just six variables and two resulting factors.
Variables | Factor 1 | Factor 2 |
Income | 0.65 | 0.11 |
Education | 0.59 | 0.25 |
Occupation | 0.48 | 0.19 |
House value | 0.38 | 0.60 |
Number of public parks in neighborhood | 0.13 | 0.57 |
Number of violent crimes per year in neighborhood | 0.23 | 0.55 |
The variable with the strongest association to the underlying latent variable. Factor 1, is income, with a factor loading of 0.65.
Since factor loadings can be interpreted like standardized regression coefficients, one could also say that the variable income has a correlation of 0.65 with Factor 1. This would be considered a strong association for a factor analysis in most research fields.
Two other variables, education and occupation, are also associated with Factor 1. Based on the variables loading highly onto Factor 1, we could call it “Individual socioeconomic status.”
House value, number of public parks, and number of violent crimes per year, however, have high factor loadings on the other factor, Factor 2. They seem to indicate the overall wealth within the neighborhood, so we may want to call Factor 2 “Neighborhood socioeconomic status.”
Notice that the variable house value also is marginally important in Factor 1 (loading = 0.38). This makes sense, since the value of a person’s house should be associated with his or her income.