Child maltreatment and related fatalities are a widespread problem (Miyamoto et al., 2017); predictive models offer the promise of mitigating negative child welfare outcomes. While several jurisdictions are adapting models that predict outcomes such as future indicated investigations and foster care placements, less attention has been paid to conceptualizing models that either predict future reports that result in severe harm, or that attempt to quantify the risk of harm to a child. Yet it is likely that if such models are found to be robust, they may most effectively assist in the allocation of interventions and services that are specifically meant to prevent severe harm.
This study builds a machine learning random forest model to predict future severe harm for ~200,000 children involved in investigations at a large urban child welfare system in the northeast, ending in 2013 or 2014. We construct several alternate versions with varying time scales for the outcome (e.g., severe harm within one vs. two years of an investigation’s start date). By considering model predictions stratified by race groups, and by also using two examples of potential model applications (case reviews and clinical consultations), we analyze various metrics of fairness within the context of racial equity. Last, we compare our model’s performance to that of other models, predicting either a) the recurrence of an indicated investigation, or b) a child fatality.
We find that model performance is substantially higher when predicting severe harm within 1 (prevalence = 3.2%; AUC = 0.8) or 2 years (prevalence = 5.7%; AUC = 0.8), compared to within 6 months (prevalence = 1.3%; AUC = 0.75) of an investigation’s start date. Further, severe harm models show better performance than other models (AUC = 0.68 – 0.72), and in particular, are more appropriate for predicting fatalities than even a fatality model. While overall model performance is similar across race groups, we nonetheless show empirically that error rates necessarily differ across groups with unequal prevalence rates (shown theoretically by Chouldechova, 2016, and others); this is especially true for those interventions aimed at the smallest subset of families.
Our severe harm model performs substantially better than comparison models. More research is needed to ascertain whether this finding applies across jurisdictions. Further, we note that a) it is useful to optimize predictive power across multiple potential time scales rather than defer to historically used windows; and b) addressing error rate differences across race groups is often beyond the scope of statistics, and therefore necessarily a policy question. In conclusion, a severe harm type model may be more appropriate than other models: For interventions specifically aimed at preventing fatalities, such a model may simply perform better than one predicting fatalities, for reasons discussed. For various interventions specifically aimed at mitigating severe harm, more all-purpose models make the mistake of equally weighing widely disparate allegations (e.g., physical abuse involving a young child and educational neglect involving a truant teenager).