Methods: For youth placed in foster care between ages 12 and 14, we assessed their risk of exiting care without permanency by age 18, 4-6 years prior to their exit, based on their child welfare service involvement history. To develop predictive risk models, we used various machine learning algorithms and 28 years (1991–2018) of child welfare service records from California. Performances were evaluated using F1 score, AUC, precision, and recall. Model fairness was assessed using calibration, predictive parity, and error rate balance.
Results: The gradient boosting decision tree and random forest showed the best performance (F1 score = .54~.55, precision score = .62, recall score = .49). Half of all youth who were observed to exit care without permanency were identified among the top 30% of youth the model identified as high risk, with a 39% error rate. Although racial disparities between Black and White youth were observed in imbalanced error rates, calibration and predictive parity were satisfied.
Discussion: Our findings illustrate the manner in which potential applications of predictive analytics, including those designed to achieve universal goals of permanency through more targeted allocations of resources can be tested. Our results are promising in that even a simple predictive machine learning model with limited information extracted from existing administrative data, such as that built here, could be useful for early identification of youth at risk. In addition, the results of algorithmic fairness analysis indicate the model performance varied depending on racial membership of youth, which can be attributed to racial variance in the rate at which Black and White children exit care without permanency.