Methods: The machine learning models were developed to assess the risk of aging out by age 18 among youth who spent at least one day in foster care between age 12 and 14. Predictive models were developed using 28 years (1991-2018) of California child protective services (CPS) records and three classification algorithms: penalized logistic regression, random forest, and gradient boosting decision tree. The predicted outcome variable (or “target feature” in machine learning terms) was whether the termination reason code of the last foster care episode (between age 14 and 18) does not indicate exiting through permanency. The model performances were evaluated using F1 score and predictive racial bias was examined using calibration, predictive parity, and error rate balance. Data preprocessing, model training, and model evaluation was conducted using Python 3.7.
Results: While all models performed similarly, the best performing model was gradient boosting decision tree (F1 Score = 0.54, AUC=0.72). Among the top 30% of predicted risk scores, the best performing model successfully identified half of all aged-out youth with 39% of false positive rate. For feature importance, both the total length of time in placement and the total number of referrals were found to be in the top 5 of all models. Although calibration and predictive parity were satisfied, racial disparity between White and Black youth were observed in imbalanced error rates.
Relevance/Contribution: The proposed study will contribute to the literature through an examination of whether individual risk of aging out can be predicted when foster youth first become eligible for various independent living program benefits using machine learning models trained on their early trajectories of CPS engagement. Such a model could support the proactive identification of foster youth at high risk. These models could then be used to improve the design and targeting of preventative programs to improve the outcomes of foster youth.