Research That Matters (January 17 - 20, 2008)


Directors Room (Omni Shoreham)

Treatment of Missing Data in Social Work Research: Three Steps to Imputation

Roderick A. Rose, MS, University of North Carolina at Chapel Hill and Mark W. Fraser, PhD, University of North Carolina at Chapel Hill.

Purpose: Missing data represent a threat to internal validity because they may produce selection biases if not treated properly. In “multiple imputation,” (MI), missing values are replaced by values drawn from conditional probability distributions. These distributions are generated using a model that the user specifies (called the imputation model). This process is repeated multiple times, creating multiple versions of a data set. Each data set is subjected to analysis (e.g. OLS regression), producing a set of multiple parameter estimates that are combined to obtain a point estimate. In limited situations it is appropriate to delete the cases with missing data and proceed without imputation. In most other situations, they must be imputed. The decision to delete or impute is based on a likelihood ratio test. Subsequently imputation, if needed, can be implemented using one of several software packages. These represent the two main steps that must be undertaken to appropriately handle missing data. However, the selection of variables used in the test and the imputation model, from among all of the variables in a data set, cannot be overlooked. This selection represents an important step (step “zero”) that has consequences for the outcome of the test and the success of the imputation. Failure to conduct a rigorous step zero may result in the wrong missing data handling procedure and substantially biased estimates. Method: We demonstrate step zero using both simulated and actual missing data. Simulations allow us to make judgments about the accuracy of the likelihood ratio test and imputation, but do not always provide realistic situations. Actual missing data provide greater realism but provide no absolute benchmark for the accuracy of the imputation model because this would require knowledge of the missing values. In the first demonstration we simulated two data sets that differed based on whether the missing observations were conditioned on other variables in the data set. We demonstrate how the likelihood ratio test can lead to the wrong answer if step zero is not conducted carefully. In the second demonstration, we use actual missing data. Using actual data allowed us to consider the situation of high collinearity, which is difficult to simulate. We show that based on the objective of imputation—to generate the least-bias estimates—the model that we selected using step zero was indeed “best” from among several alternatives. Specifically, we show that it was better than an imputation using a more limited set of variables. Results: Failure to select the “best” model for imputation can lead to incorrect conclusions from the likelihood ratio test (and thus the wrong strategy of either deleting or imputing) and more-biased parameter estimates. Implications: MI is an advanced procedure for handling missing observations, and can lead to less-biased estimates than deletion. Except in rare situations where the missing data fit a certain pattern or specific methodological approaches are used, social work researchers who conduct analysis in the presence of missing data should follow this three-step process to simplify the handling of missing data.