Methods: This study conducted a fairness analysis on a machine learning model developed to assess families’ needs for home visiting services provided by the Bridges MCHN program in Orange County, California. In this study, a deidentified linked dataset comprising three data sources was utilized: 1) the Bridges Maternal Child Health Network (MCHN) program records from 2011-2016; 2) California vital birth records from 2011-2016; and 3) child protective services (CPS) records from 2011-2019. The study population (n=132,216) was stratified to compare the characteristics of children who were accurately and falsely determined by the baseline model and the machine learning model. The performances of two models were compared for each subgroup defined by maternal and birth characteristics using a metric dividing the false-negative rates of the machine learning by that of the baseline model. To examine whether the likelihood of being falsely identified to be at low risk varied by maternal and birth characteristics, a logistic regression model was used. The intersectionality of attributes that may indicate a higher risk of false-negative determinants was identified. Analyses were completed in Stata version 17.0 and Python 3.7.4.
Results: The study found that the machine learning model decreased false-negative rates by 58%. However, the model was less effective for mothers who were 26 years old or older, born outside the US, received prenatal care during the first trimester, paid for delivery with private insurance or cash, and had paternity established at birth. Logistic regression analysis showed that children born to mothers who were 26 years or older and those born to foreign-born mothers were more likely to be falsely predicted as low risk. The study also found that false-negative rates among children born to foreign-born Hispanic mothers were significantly higher than their counterparts born to US-born Hispanic mothers and White mothers.
Conclusions and Implications: The study finds that the machine learning model generated significantly lower false-negative rates across all subgroups defined by maternal and birth characteristics compared to the Bridges pre-screening tool. However, it did not perform well for children born to foreign-born Hispanic mothers, possibly because the way these children’s maternal and birth characteristics were associated with the risk of substantiation was significantly different from the way other children’s characteristics were related to the risk. This study highlights the importance of considering intersectionality when examining fairness in machine learning applied to child welfare and has implications for the use of machine learning in public services.