In: Economics
Appraise what new statistical methods are used in the evaluation of conceptual theories outlining specific advantages these methods provide. Compare Structural Equation Modeling (SEM) techniques providing advantages of using SEM to other conventional methods outlining some of the various statistical techniques that SEM is able to perform. Evaluate sampling techniques used to conduct hypothetical studies and asses the benefits of each sampling method based on best fit to application. Critique validity and reliability methods for appropriate constructs and compare advantages and disadvantages of each method describing what methods to use with different operational techniques. Compare and evaluate factor analysis for confirmatory versus exploratory methods and assess when each is appropriate proving examples and application usages. Assess the differences of various regression analysis methods and demonstrate by examples what regression methods are most appropriate for different application. Finally discuss and recommend best statistical techniques and methods to operationally use for means comparisons, non parametric evaluation, bivariate correlation, ANOVAs, Chi Square, regression, and other techniques as appropriate. Assess the overall concept of statistical power, why it has import to statistical evaluations, and what SPSS contributes to statistical analysis in today’s research.
Structural equation modeling, or SEM, is a very general, chiefly linear, chiefly cross-sectional statistical modeling technique. Factor analysis, path analysis and regression all represent special cases of SEM. SEM is a largely confirmatory, rather than exploratory, technique. That is, a researcher are more likely to use SEM to determine whether a certain model is valid., rather than using SEM to "find" a suitable model--although SEM analyses often involve a certain exploratory element.
In SEM, interest usually focuses on latent constructs--abstract psychological variables like "intelligence" or "attitude toward the brand"--rather than on the manifest variables used to measure these constructs. Measurement is recognized as difficult and error-prone. By explicitly modeling measurement error, SEM users seek to derive unbiased estimates for the relations between latent constructs. To this end, SEM allows multiple measures to be associated with a single latent construct.
A structural equation model implies a structure of the covariance matrix of the measures (hence an alternative name for this field, "analysis of covariance structures"). Once the model's parameters have been estimated, the resulting model-implied covariance matrix can then be compared to an empirical or data-based covariance matrix. If the two matrices are consistent with one another, then the structural equation model can be considered a plausible explanation for relations between the measures.
Compared to regression and factor analysis, SEM is a relatively young field, having its roots in papers that appeared only in the late 1960s. As such, the methodology is still developing, and even fundamental concepts are subject to challenge and revision. This rapid change is a source of excitement for some researchers and a source of frustration for others.
In the behavioral and social sciences, structural equation models (SEMs) have become widely accepted as a statistical tool for modeling the relation between latent and observed variables. SEMs can be perceived as a unification of several multivariate analysis techniques (cf. Fan, 1997), such as path analysis (Wright, 1934) and the common factor model (Spearman, 1904). SEMs represent hypotheses about multivariate data by specifying relations among observed entities and hypothesized latent constructs. By minimizing an appropriate discrepancy function, model parameters can be estimated from data and goodness-of-fit indices for a sample can be obtained. Latent variable models allow measurements to be purged of measurement errors and thus offers greater validity and generalizability of research designs than methods based on observed variables (Little, Lindenberger, & Nesselroade, 1999). Traditionally, SEM is considered a confirmatory tool of data analysis.
Decision Trees
Decision trees are hierarchical structures of decision rules that describe differences in an outcome with respect to observed covariates. Assume that for each subject, an observed categorical outcome y and a vector of covariates x were collected. A decision tree refers to a recursive partition of the covariate space that is associated with significant differences in the outcome variable. It is usually depicted as a dendrogram (below Figure ). The partitions of the covariate space are defined as inequalities on the individual dimensions of the covariate space. Hence, decision trees can be read like rule sets, for example, “if x1 > 5 and x2 < 2 then y = 0, otherwise y = 1.” The maximum number of such rules encountered until arriving at a decision designates the depth of the tree. Formally, decision trees describe partitions of the covariate space that are orthogonal to the axes of the covariate space. The paradigm was introduced by Sonquist and Morgan (1964) and has gained popularity through the seminal work by Breiman, Friedman, Olshen, and Stone (1984) and Quinlan (1986). Decision trees split a data set recursively by maximizing an information criterion or by applying statistical tests to determine the significance of splits. As an extension, model-based trees have appeared in many varieties. Model-based trees maximize differences of observations with respect to a hypothesized model. CRUISE (Kim & Loh, 2001), GUIDE (Loh, 2002), and LOTUS (Chan & Loh, 2004) allow parametric models in each node. Linear model trees based on a maximum-likelihood estimation procedure have been also described by Su, Wang, and Fan (2004). Zeileis, Hothorn, and Hornik (2006) reported applications of recursive partitioning with linear regression models, logistic regression models, and Weibull regression for censored survival data. A recent comprehensive framework for model-based recursive partitioning was presented by Zeileis, Hothorn, and Hornik (2008), and an important treatment of recursive partitioning approaches was given by Strobl, Malley, and Tutz (2009). Decision tree methods are usually considered as an exploratory data-analytic tool.
Numerical parameter estimation Parameter estimation in SEM requires the solution of an optimization problem. Typically, the solution is found by one of many numerical optimization methods. Given a set of starting values, the method will iteratively improve the estimate until the target function, such as the likelihood function value, will converge. In some cases, convergence is not reached, for example, if starting values are chosen far away from the unknown population values. Building a SEM Tree requires a large number of model fits and problems in some model fits are likely to occur.
The SEM Tree algorithm passes on parameter estimates of a node as starting values for submodels when evaluating splits. If the model does not converge, the original starting values can be used. If this still fails, either the potential split is disregarded or a new set of starting values has to be generated by a different estimation method, for instance, by a least-squares estimate. Non-converging models are marked during the tree growing process. Non-converging estimates can result for various reasons. Whenever non-converging models occur, the user is advised to cautiously inspect the reasons. If non-convergence occurs very early in the SEM Tree, model respecification is advised. In later stages of the growing process, non-convergence may reflect small sample size. In these cases, the limited information in the data set prevents the tree from being grown further. In our implementation, unique starting values and the passing-on of previous estimates as starting values is implemented.
Parameter restrictions A particular strength of SEM is the possibility of algebraic restrictions on models. The likelihood ratio test offers statistical means to significantly reject restrictions if the sampled data contradict them. Most often, restrictions in SEM include either restricting a parameter to zero or restricting a set of parameters to be equal. Hence, significant values of the respective test statistics indicate that in the first case the parameter is truly different from zero or in the second case the chosen parameters are truly different from each other. Restrictions on SEM Trees help to test hypotheses, reflecting the substantive questions of the researcher.
In the following paragraph, we introduce two types of restrictions across models in a tree: (1) A global restriction requires a parameter from the template model to be the same across all models in the tree. Due to the greedy nature of the algorithm, this can be achieved only if values for the chosen parameters are estimated once on the full data set and regarded as fixed for all subsequent models. (2) A local restriction requires a parameter to be equal on all models whose nodes have the same father node only during split evaluation. This can be thought of as a parameter equality restriction across models in a multi-group setting. This leads eventually to diverging values of the restricted parameter in the leaf models. However, this restriction allows that parameters are freely estimated while their differences across submodels do not contribute to the evaluation of the split criterion. For example, in a regression model, a researcher might be interested in model differences with respect to the regression parameters, but not in differences of the regression line with respect to prediction accuracy, that is, the residual error term. In that case, the residual error can be defined as locally restricted. Without this restriction, the tree can in principle find distinct subgroups that are characterized by the exact same regression slope but differ only in the residual error term. Certainly, the latter case might also be of interest, depending on the researcher’s questions.
Variable Splits Under Invariance Assumptionsv=
A widely used class of SEM are factor models that define relations between hypothesized latent factors and observed scores. In applied psychological research, hypotheses about factor-analytic structures are often tested across multiple groups. For example, in aging research, a common approach is the separation of the participants into models according to their age group, that is, a model for the younger and a model for the older participants (e.g., Kray & Lindenberger, 2000). Multi-group factor models are essentially represented by replications of a template model for each group whereby free parameters are unique within each group. Parameter constraints cannot only be set within groups but also across groups. This allows testing hypotheses whether parts of the model are indeed equal across groups or significantly differ from each other. An obvious question could be, “Do two groups differ in their average value of the latent construct?” These questions can be answered validly to the extent that the researcher can ascertain that latent constructs have been measured identically across all groups. This requirement, referred to as measurement invariance (MI) or measurement equivalence, has been debated in psychology for more than a century (Horn & McArdle, 1992; Meredith, 1964, 1993). Measurement invariance is traditionally tested through a sequence of hypothesis tests. While the literature has many concepts of invariance, most often four nested concepts are used: (1) configural invariance, sometimes termed configuration invariance or pattern invariance, requires the invariance of the pattern of zero and non-zero factor loadings across groups; (2) metric invariance, weak invariance, or factor pattern invariance requires invariance of the values of the factor loadings across groups; (3) strong invariance or scalar invariance assumes intercepts of the indicators and all factor loadings to be the same across groups; (4) strict invariance additionally restricts the residual errors to be equal across groups, to allow the interpretation of standardized coefficients across groups.
Having factor loadings, means, and variances all freely estimated could result in SEM Trees whose implied hierarchical group structure might violate the requested invariance qualifications. However, we definitely can gain knowledge about our data set from freely estimating all parameters. This would correspond to an exploratory search predicting the maximal difference between submodels and therefore subsumes covariates that maximally break the invariance assumptions. In line with Meredith (1993), we are inclined to require strong invariance whenever we are interested in latent mean differences. Since factor loadings express the expected change in the observed variable per unit of change of the latent variable, this invariance test is a test of equal scaling across the groups (Jöreskog, 1969). In that case, we suggest to only explore variable splits under the constraint that factor loadings are equal across all submodels. Effectively, the SEM Tree can be generated under the assumption of the chosen level of measurement invariance. By choosing a global restriction across the factor loadings, metric invariance can be imposed on the model; by extending the restriction to the score means, strong invariance is imposed. However, the assumption of MI is not yet statistically validated. By introducing additional tests whether these global restrictions hold at the stage of covariate split evaluation, the tenability of MI is ensured. Precisely, at each evaluation of a possible split in a submodel, a third model is introduced which is a variant of the submodel, only that the global restriction over the factor loadings is relaxed. The likelihood ratio between the compound submodel and the relaxed submodel allows testing MI in the traditional way by tentatively accepting the model as invariant if the null hypothesis cannot be rejected. Formally, let the estimated parameters of the full model before a split be ?full, and the constrained compound split model’s parameters ?res, and ?free the freely estimated parameters of the compound model of all split models. These models are nested and their likelihood values will satisfy
?2LL(?free) < ?2LL(?res) < ?2LL(?full) (5)
The only way to maintain a variable split is if the likelihood ratio 2LL(?full)?2LL(?res) is significantly different from zero, suggesting that the full model is truly a worse representation of the data set, and 2LL(?res)?2LL(?free) is not significantly different from zero, suggesting that the invariance assumption is tenable.
With this procedure in place, SEM Trees allow the generation of hierarchical model structures that predict parameter differences in SEM under a chosen assumption of measurement invariance, be it configural, metric, strong, or strict invariance.
Note that for factor-analytic models, both generative modes of the SEM Tree, either with or without measurement invariance, can make sense, depending on the researcher’s questions. A SEM Tree under measurement invariance can be used to predict differences in the factors, a regular SEM Tree might detect profiles of factor loadings that differ in the sample.
Cross-sectional variation is the variation across the respondents who are part of a research study.
SEM is designed to look at complex relationships between variables, and to reduce the relationships to visual representations. A research design can be described in terns of the design structure and the measurements that are conducted in the research. These structural and measurement relationships are the basis for a hypothesis.
And when using SEM, the research design can be modeled by computer. The relationships that are displayed in SEM modeling are determined by data arranged in a matrix. SEM uses cross-sectional variation to do the modeling that yields the conclusions.