In: Statistics and Probability
Discuss recommendations for best practices in Structural Equation Modeling.
The most commonly used estimation method in SEM is maximum-likelihood (ML). ML has various assumptions including:
1) There will be no missing data;
2) The endogenous (or dependent) variables will have a multivariate normal distribution.
If a complete data file is unavailable, then the researcher must test whether the data are missing completely at random (MCAR), missing at random (MAR), or missing not at random (MNAR). If data are MCAR, then the “missingness” on the variable of interest is unrelated to any of the variables in the data set. If the data are MAR, then systematic differences may exist between missing and observed values;
However, these differences are accounted for by other variables in the data set. Finally, if data are MNAR, then there is a systematic pattern to the missing data (the presence or absence of a score on variable X is related to the variable itself). To determine whether data are MCAR, Little’s MCAR test can be used (i.e., a statistically non-significant p value [>.05] denotes that data are MCAR). If Little’s test is statistically significant, then data may be MAR or MNAR; further investigation of participants with missing data is required. If data are MCAR or MAR, the sample is large, and the proportion of missing data is modest (< 5%), list wise deletion is a reasonable option. An alternative approach is using Multiple Imputation (MI) to estimate missing data. Finally, when data are MNAR, item parcels may be useful; though for the novice practitioner, SEM would not be recommended.
Having data that are multivariate normal is a key assumption when performing SEM (using the ML default). Although univariate normality does not guarantee the multivariate normality of one’s data, we recommend that each variable be scrutinized to identify any deviations from a normal distribution. We provide a straightforward overview of the primary visual and statistical techniques that may be used to gauge uni-variate normality. The two key considerations are skew and kurtosis. Skewness refers to the lack of symmetry in the distribution of one’s data (i.e., for a symmetrical distribution, or one without skew, the distribution to the left or right of the center-point looks identical).
Kurtosis may be thought of as the “tail-heaviness” of the distribution of one’s data (i.e., leptokurtosis happens when the number and extremity of outliers is smaller than would occur with a normal distribution; platykurtosis occurs when the number and extremity of outliers is greater than would take place with a normal distribution). Suggested cut-offs for the skewness index (i.e., skew divided by the standard error of skew) and the kurtosis index (i.e., kurtosis divided by the standard error of kurtosis) are absolute values greater than 3 and 10, respectively. Determining multivariate normality is more difficult, as popular statistical packages such as SPSS do not offer formal tests of multivariate skewness and kurtosis. However, Wan Nor offers a step-by-step guide to graphically assessing multivariate normality using SPSS and DeCarlo provides SPSS syntax that may be used to determine both univariate and multivariate normality. Another option is to assess model fit using a p-value that is not ML- based.
Two-Stage Modelling:
When conducting SEM, it is recommended that the measurement models be assessed first, using confirmatory factor analysis (CFA), followed by simultaneous assessment of the measurement and structural models. As noted earlier, each measurement model consists of at least one latent factor, its measured indicators and their associated error terms. The structural model represents the predicted associations among the latent variables based on theory and/or prior empirical research.
Thus, a model containing two latent variables (Y1 and Y2), each of which is represented by three manifest indicators (Y1: x1, x2, x3; Y2: x1, x2, x3) would consist of two measurement models (one for Y1 and one for Y2) and one structural model that tests Y1 and Y2 simultaneously. With the two-stage approach, each measurement model is tested. If adequate fit is not obtained, then each model may be subject to re-specification, provided one can justify doing so on the basis of theory, indicator content, and/or past research.
It should be noted that, unless a compelling reason is specified a priori, simply correlating error terms to improve fit is not recommended because doing so takes “advantage of chance, at a cost of only a single degree of freedom, with a consequent loss of interpretability and theoretical meaningfulness” . The structural model then is evaluated.
Reliability and Validity:
When testing each measurement model, using confirmatory factor analysis, output can be used to assess indicator and composite reliabilities as well as convergent and discriminant validities. Indicator reliability (IR) refers to the proportion of variance in each measured variable that is accounted for by the latent factor it supposedly represents.
Model Fit:
A broad range of fit indices, encompassing four broad categories (i.e., overall model fit, incremental fit, absolute fit, and predictive fit), should be used . Overall model fit, which includes the chi-square test, tests precisely what it describes: whether the model fits the observed data. Ropovik notes that, while a statistically significant chi-square value is often ignored on the grounds that the test itself is overly sensitive when large samples are used, the “only message that a significant χ2 tells is… take a good look at that model [as] something may be wrong here”. Further, the attainment of fit using other indices (e.g., GFI or RMSEA) does not necessarily mean that the chi-square test was statistically significant because of a trivial misspecification. Detailed analysis of the model is required.
Incremental fit indices compare the model that is being tested to a baseline model which, typically, is one in which all variables are uncorrelated. Sample indices include: the normed fit index (NFI), the comparative fit index (CFI), and the Tucker Lewis index (TLI). Absolute fit indices, such as the root mean square error of approximation (RMSEA), goodness-of-fit index (GFI), and the standardized root mean square residual (SRMR), determine how well a model specified a priori reproduces the sample data. If the SRMR is not reported, then we recommend researchers furnish a table of correlation residuals, which represent the difference between a correlation for the model and an observed correlation. The greater the absolute magnitude of a given correlation residual, the greater the misfit between the model and the actual data for the two variables in question.
With respect to cut-off values for various fit indices, the current perspective is that individuals should avoid mindlessly using cut-off values and that “no single cut-off value for any particular can be broadly applied across latent variable models”. Measurement quality, which McNeish et al. operationalize as the magnitude of the standardized loadings between each latent construct and its manifest variables, plays a critical role with respect to the inter-pretability of cut-off values. Referring to the reliability paradox, these researchers note that fit indices tend to be worse when measurement quality is higher rather than lower. Thus, a model with standardized loadings of .90 may produce worse fit statistics than a model with standardized loadings of .40―although the former has better data-model fit than does the latter.
Finally, predictive fit indicators examine “how well the structural equation model would fit other samples from the same population”. One common example is the Akaike Information Criterion (AIC), which measures “badness” of fit
When writing a manuscript that involves SEM, various pieces of information are essential if readers are to make an informed decision about the appropriateness of the findings. We recommend the following be reported:
1. As determined by an a priori power analysis, the minimum number of participants needed, given the models that are being tested.
2. At least one alternative model that is plausible in light of extant theory or relevant empirical findings.
3. Graphical displays of all measurement and structural models.
4. Brief details about the psychometric properties of scale scores for all measured variables (e.g., Cronbach’s alpha and its 95% confidence intervals or, preferably, omega as well as 2 to 3 sentences per measure detailing evidence of content and construct validities).
5. The proportion of data that are missing and whether missing data are MCAR, MAR, or MNAR. As well, researchers should explicate how this decision was reached (e.g., why does a researcher assume missing data are MAR?), and the action taken to address missing data.
6. Assessments of univariate and multivariate normality for all measured indicators.
7. The estimation method used to generate all SEMs (default is ML estimation).
8. The software (including version) that was used to analyze the data.
9. In accordance with the advised two-step approach, full CFA details about each measurement model followed by complete SEM details about the structural model.
10. Indicator and composite reliabilities.
11. Average variance extracted (AVE) for each latent factor which denote convergent validity.
12. Discriminant validity of latent factors, as per Fornell and Larcker’s test.
13. All standardized loadings from latent variables to manifest variables (reflective models).
14. Fit indices that reflect overall, absolute, and incremental fit. If applicable, predictive fit indicators should be included.
15. A clear and compelling rationale for all post-hoc model modifications.
16. An indicator of effect size for the final model.