Abstract: Creating a Child & Elder Abuse Thesaurus Using Natural Language Processing Methodology (Society for Social Work and Research 22nd Annual Conference - Achieving Equal Opportunity, Equity, and Justice)

522P Creating a Child & Elder Abuse Thesaurus Using Natural Language Processing Methodology

Schedule:
Saturday, January 13, 2018
Marquis BR Salon 6 (ML 2) (Marriott Marquis Washington DC)
* noted as presenting author
Dale Fitch, PhD, Associate Professor, University of Missouri-Columbia, Columbia, MO
Illhoi Yoo, PhD, Associate Professor, University of Missouri-Columbia, Columbia, MO
Abu Mosa, MA, Director of Research Informatics, University of Missouri-Columbia, Columbia, MO
Background and Purpose
Social work research is in the early phases of Big Data analytics and Artificial Intelligence, especially regarding natural language processing. This situation is opportune for Child and Adult Protective Services staff who investigate more than 3.3 million allegations of child abuse and 2 million allegations of elder abuse each year. Reviewing prior reports can be time consuming when case records are lengthy and can take up to four or five hours to review. Studies in the medical field have shown that a medical thesaurus, e.g., MeSH, can significantly improve the performance of natural language algorithms applied to biomedical literature and electronic medical records. Therefore, our project created a human services thesaurus and we compared the vocabularies created from social work case notes to Medline articles on child and elder abuse using MeSH terms. This project used the text mining packages in R and Python to perform that task.

Methods
Our data sources were 465,938 child abuse case records and 91,251 elder abuse case records obtained from a comprehensive children’s services agency (child abuse) and a state health department (elder abuse). We created the vocabulary following controlled vocabularies procedures supplemented with a topic modeling approach articulated by Underwood & Sellers (2012). The process involved producing a taxonomy constructed from the vocabulary terms by arranging them into a hierarchy of supertype-subtype relationships, and then constructing a controlled vocabulary that combined the terms with the taxonomy to capture the associated relationships between the supertype-subtype concepts. In order to assess the discriminant validity of the resultant vocabularies, we also analyzed the vocabularies of child and elder abuse articles contained in PubMed. Sensitivity, the intersection of relevant cases and retrieved cased divided by the number of relevant cases, was set at .05, and specificity, the intersection of relevant cases and retrieved cases divided by the number of relevant cases, was set at .95.

Results
The child abuse case records contained 13,878,599 terms and elder abuse case records had 214,508 terms. There was little concordance between the terms generated from the case records and the Medline MeSH terms. These differences continued as the subtypes of physical abuse, sexual abuse and other forms of maltreatment were examined with these findings having implications for the construction of the taxonomy. For example, the specificity for physical abuse had five terms/phrases greater .9, while sexual abuse only had three terms/phrase greater than.8, but all were less than .9.

Conclusion/Implications
While the MeSH taxonomy has been instrumental in developing advanced natural language processing algorithms in the healthcare field, that same taxonomy will not be helpful for social workers in agency settings. The methodology presented in this project will open up an array of textual data already stored in our human services databases yet rarely examined due to time and effort costs. This emerging area of research will be a critical first step in developing Artificial Intelligence algorithms in the human services.