Abstract: Exploration in Predictive Analyses and Potential Implications (Society for Social Work and Research 23rd Annual Conference - Ending Gender Based, Family and Community Violence)

Exploration in Predictive Analyses and Potential Implications

Schedule:
Sunday, January 20, 2019: 8:30 AM
Golden Gate 4, Lobby Level (Hilton San Francisco)
* noted as presenting author
Kelly Stepura, PhD, Executive Vice President of Applied Research and Solutions, KaleidaCare, Austin, TX
Donald Baumann, PhD, Adjunct Professor, St. Edwards University, Austin, TX
Background/Purpose:

The proliferation of computer-based data systems in child welfare has resulted in a critical mass of data that can be used to address research questions using machine learning approaches.  Models created using these approaches can analyze larger, more complex datasets with faster, more accurate results than traditional statistical approaches. The current investigation sought to apply a machine learning approach to predict discharge outcomes as early in the youth’s time in care as possible.  This work is intended to be of practical use to support agency decision-making.  However, it is anticipated that agencies and researchers could use the approach outlined in the study for similar purposes.   

Methods:

The administrative data used for the study were collected from a Tennessee agency under a performance-based contract (PBC) and the main outcomes of interest, permanency and time in care, were chosen due to their relevance to the PBC initiative in Tennessee.  Results of the study were intended to inform caseworker decision-making by predicting the likelihood of achieving permanency within a targeted number of care days based on information known at admission. 

Initially, traditional statistical approaches (e.g., cluster analysis and logistic regression) were applied to the data with inadequate results.  Machine learning analytical strategies were then applied, including decision trees, boosted models, and random forests.  These efforts continued to result in low accuracy rates.  Subsequently, a more useful predictive model was created by applying Bernoulli Naïve Bayes Classifier to a limited set of independent variables. 

Results:

For the permanency outcome variables, the final Naïve Bayesean model predicted whether youth would or would not achieve permanency with a 73.51% test sample classification accuracy rate using ten-fold hold-out samples.  The highest contributors to the model were age at admission, prior placements, the presence of siblings in care, and current runaway behavior as a Child and Adolescent Needs and Strengths (CANS) treatment need.  However, validation models attempting to predict whether youth would exceed their target care days continued to result in low accuracy rates.  Further analysis pointed to an inverse relationship between the two variables, such that youth who achieved permanency were unlikely to do so within their established target care days.

Conclusions and Implications:

The relevance of this study is in offering a machine learning approach to predict foster youth outcomes, and in the potential practical utility of the model itself.  These findings could be incorporated into agency processes to support decisions such as resource allocation, service provision, and treatment planning. Results also point to the need for future research focusing on the prediction of permanency and time in care.  Additionally, examination of targets for youth care days, such as the accuracy and implications of varying targets and the inverse relationship between time in care and permanency is warranted.  Finally, future research should further explore the relevance of CANS items, domains, and factors in predicting discharge outcomes for foster youth.