Saturday, 14 January 2006 - 8:44 AMTreatment of Missing Data in Social Work Research: Methods for Informing a Multiple Imputation Model
Purpose: In survey research, respondents may either return surveys incomplete or not return surveys at all. Importantly, information is missing due to a variety of causes—collectively known as the missing data mechanism—many of which cannot be known by the researcher. This is a threat to internal validity because it may represent selection effects. Discarding subjects with missing items or surveys, or ad hoc procedures such as substituting the sample mean of each missing item or the mean of each scale for the missing values can lead to biased parameter estimates.
However, in “multiple imputation,” (MI), missing values are replaced by values drawn from conditional probability distributions. This process is repeated multiple times, creating multiple versions of a data set. Each data set is subjected to the intended analysis (e.g. OLS regression), producing a set of multiple parameter estimates that are combined to obtain a point estimate. The quality of MI-generated data depends on assumptions about the missing data mechanism, the intended analytical model (for ultimate analysis of outcomes) and the composition of the imputation model (the model used to impute the data). Key recommendations for developing a strong imputation model are sometimes impractical. One recommendation is to use all available variables in an imputation model. However, with some data sets consisting of dozens of variables, an imputation model will often fail to converge (using Maximum Likelihood algorithms) if it becomes unwieldy. Consequently, we must make critical decisions about what variables should be in an imputation model. A strategy for informing such critical decisions does not exist. Method: In order to evaluate various missing data pre-analysis procedures that may inform which variables to use in an imputation model, we created simulation missing data using observed survey data from the analysis of the Making Choices skills training program. We simulated three data sets. The first consisted of data missing completely at random (MCAR); the second of data missing at random (MAR), with missingness conditioned on two observed variables. These two variables were then deleted in a third data set, representing an attempt to capture data not missing at random (NMAR). We tested a variety of strategies, including bivariate and multivariate methods to predict missingness using various configurations of both items and scales. Results: MI pre-analysis procedures on the MCAR simulation data revealed unplanned relationships. On the MAR and NMAR data, we observed unplanned relationships that were expected to arise to help establish an imputation model under NMAR conditions. Due to a relatively low number of missing data points, multivariate methods often failed to produce useful estimates. Implications: MI is an advanced procedure for handling missing observations, and can under textbook conditions lead to less-biased estimates than deletion. These conditions can rarely be assumed and are not observable. They constitute a black box that pre-analysis methods may only partially reveal. Social work researchers who conduct analysis in the presence of missing data should always compare estimates obtained using MI with estimates obtained using deletion, but avoid ad hoc imputation strategies.
See more of Methods in Evidence-Based Practice |