Abstract: Automated Identification of Domestic Violence in Written Child Welfare Records: A Mixed Methods Approach Using Text Mining and Machine Learning (Society for Social Work and Research 25th Annual Conference - Social Work Science for Social Change)

All live presentations are in Eastern time zone.

Automated Identification of Domestic Violence in Written Child Welfare Records: A Mixed Methods Approach Using Text Mining and Machine Learning

Schedule:
Thursday, January 21, 2021
* noted as presenting author
Bryan Victor, PhD, Associate Professor, Indiana University, Indianapolis
Brian Perron, PhD, Professor, University of Michigan-Ann Arbor, MI
Rebeccah Sokol, PhD, Research Fellow, University of Michigan-Ann Arbor, MI
Lisa Fedina, PhD, Assistant Professor, University of Michigan-Ann Arbor, MI
Joseph Ryan, PhD, Professor, University of Michigan-Ann Arbor, Ann Arbor, MI
Background and Purpose: Child welfare agencies frequently lack ready access to information about the front-end service needs of the families they serve. That is, agencies often do not have service need-related data stored in a structured format that would permit statistical analyses to determine the prevalence, correlates or geographic distribution of particular needs. A potential corrective to this problem is the information contained within the written summaries in an agency’s administrative records, but this information is stored in an unstructured format. Storage of information in this manner does not permit analysis of service need-related data across the thousands of cases an agency processes each year. To address this issue, the current study tests the feasibility of text mining and machine learning procedures for identifying problems related to domestic violence documented in child welfare investigation summaries, a prominent service need for child welfare-involved families.

Methods: The current study used a mixed methods approach to develop and test a set of computer models to automate the coding of investigation summaries for our target construct. To start, four expert human coders labeled a collection of child welfare investigation summaries (N = 1,402) for the presence (DV+) or absence (DV-) of an active domestic violence service need. These labeled documents were then used to develop a set of text mining and machine learning models, and to test their accuracy and reliability. We first developed a rules-based text mining model that relied on the use of an expert dictionary and sentiment analysis to classify documents as DV+ or DV-. We then develop more advanced machine learning models that used a k-nearest neighbor algorithm to perform the coding task. Accuracy and reliability for all models were determined by comparing computer classifications to those of expert human coders.

Results: The machine learning models achieved greater than 90% accuracy in the classification of documents when compared to the classification decisions of expert human coders. Fleiss kappa estimates of coding reliability between the top-performing model and expert human coders exceeded .80, suggesting that system administrators, researchers and evaluators could confidently deploy our model to bring this task to scale, rapidly classifying their entire population of documents for the presence or absence of a domestic violence service need.

Conclusions and Implications: The results provide strong evidence that text mining and machine learning procedures can be a cost-effective solution for extracting meaningful insights from unstructured text data. While not suitable for case-level predictive analytics, the insights derived from these procedures can be particularly useful for investigating the prevalence, temporal trends and geographic distribution of domestic violence-related needs in the child welfare system. These methods have the potential to substantially enhance the use of unstructured text data in social work research and evaluation.