285P
Data Fusion: An Innovative Analytic Technique for Social Work Research
Data fusion addresses a persistent research obstacle—the unavailability of all necessary variables in one existing dataset. Developed by economists in the 1970s, data fusion is used by a range of scholars, commonly in business-related disciplines and the biological sciences. While some of the statistical techniques involved in data fusion are already familiar to social work researchers who use quantitative methods, we have not yet adequately explored data fusion. This paper presents a technical summary and example of data fusion as an analytic technique to overcome dataset limitations frequently encountered in social work research.
Methods:
A review of the data fusion literature was conducted. Fifteen studies published between 1972 and 2014 in social and physical science refereed journals contributed to this study. The literature was examined to reveal assumptions, ideal sampling methods, highest performing matching and imputation algorithms, and appropriate evaluation strategies for the use of data fusion in social work research.
Results:
Data fusion appears to be a functional analytic technique for social work research. For example, a study aims to explore a longitudinal relationship between asset holding and housing instability among single mothers. No existing nationally representative longitudinal dataset can explore this question. One existing dataset contains variables to measure asset holding over time, and another dataset has variables to measure housing instability over time. There is some overlap of variables in these datasets, e.g. some common demographic, asset, and socioeconomic variables. More robust measures of asset holding and housing instability are not shared. Data fusion provides the technique to match and impute data across the two datasets.
First, the distance between variables common to each dataset is discerned by Mahalanobis distance, which accounts for interrelated variables. To avoid nonsensical matching, critical variables must be matched perfectly. Cases can be matched on the less important common variables according to the Mahalanobis distance and calculated importance weights (Baker, 2007). Cases may then be married by K-nearest neighbor algorithm, grade correspondence analysis, regression, or expectation-maximization (EM) algorithms (Van Der Puttan; Kok, & Gupta, 2002). The result is a fused set that contains all variables; this newly created dataset allows for examination of the longitudinal relationship of asset holding and housing instability among single women with children, something that previously would not have been possible. The data fusion process is then evaluated by comparing a subsample of the fused file with the donor files (Baker, Harris & O’Brien, 1989), and an anlaysis which cases were matched (Baker, 2007).
Implications:
Expansion of the social work knowledge base is often limited by the lack of sufficient datasets. Data fusion provides a way for social work scholars to fuse existing datasets to answer complex research questions that have been unanswerable to date through secondary data analysis; further, this technique uses the existing skills of quantitative social work researchers to explore cross-sectional and longitudinal research designs that will help advance social work as a science and profession.