In: Math
8. Why is it important to assess whether missing values are randomly distributed throughout the participants and measures? Or in other words, why is it important to understand what processes lead to missing values?
Missing Values in Data
The concept of missing values is important to understand in order to successfully manage data. If the missing values are not handled properly by the researcher, then he/she may end up drawing an inaccurate inference about the data. Due to improper handling, the result obtained by the researcher will differ from ones where the missing values are present.
Item non-response occurs when the respondent does not respond to certain questions due to stress, fatigue or lack of knowledge. The respondent may not respond because some questions are sensitive. These lack of answers would be considered missing values.
Handling Missing Values
The researcher may leave the data or do data imputation to replace the them. Suppose the number of cases of missing values is extremely small; then, an expert researcher may drop or omit those values from the analysis. In statistical language, if the number of the cases is less than 5% of the sample, then the researcher can drop them.
In the case of multivariate analysis, if there is a larger number of missing values, then it can be better to drop those cases (rather than do imputation) and replace them. On the other hand, in univariate analysis, imputation can decrease the amount of bias in the data, if the values are missing at random.
There are two forms of randomly missing values:
The first form is missing completely at random (MCAR). This form exists when the missing values are randomly distributed across all observations. This form can be confirmed by partitioning the data into two parts: one set containing the missing values, and the other containing the non missing values. After partitioning the data, the most popular test, called the t-test of mean difference, is carried out in order to check whether there exists any difference in the sample between the two data-sets.
The researcher should keep in mind that if the data are MCAR, then he may choose a pair-wise or a list-wise deletion of missing value cases. If, however, the data are not MCAR, then imputation to replace them is conducted.
The second form is missing at random (MAR). In MAR, the missing values are not randomly distributed across observations but are distributed within one or more sub-samples. This form is more common than the previous one.
The non-ignorable missing value is the most problematic form which involves those types of missing values that are not randomly distributed across the observations. In this case, the probability cannot be predicted from the variables in the model. This can be ignored by performing data imputation to replace them.
There are estimation methods in SPSS that provide the researcher with certain statistical techniques to estimate the missing values. These are namely regression, maximum likelihood estimation, list-wise or pair-wise deletion, approximate Bayesian bootstrap, multiple data imputation, and many others