Methods: The present analysis is based on the linkage of two population-based and statewide administrative data sources from California: vital birth records and child protection records. Vital birth records included the full population of children born alive in California during calendar years 2002 and 2006. Child protection records for all children falling into one of these three birth cohorts and reported for alleged abuse or neglect before age 5 were extracted from California’s administrative record system. A series of multiple logistic regression model were estimated for the 2002 cohort for each of the outcomes (report, substantiated). The cohort was randomly split into a 50% derivation sample used to estimate the model. The other sample was used to validate the model. The model built on the 2002 cohort was additionally validated on the 2006 cohort to test for temporal stability
Results: There were 264,582 births in the 2002 derivation cohort and 264,581 in the 2002 validation cohort and 562,489 in the 2006 validation cohort. For the entire 2002 cohort, 14.0% were reported to CPS within the first 5 years of life and 5.2% were substantiated. In 2006, the corresponding rates for 2006 were significantly higher, with 15.0% reported to CPS (P<0.001) and 5.3% substantiated (P=0.008). In the derivation sample and the validation samples, the area under the ROC curve was 0.78 for CPS referrals and 0.82 for CPS substantiation. In the case of the substantiation model, if we flagged the 5% most risky children in the 2002 validation sample, we achieve a 27.6% positive predictive value, sensitivity of 30.8% and specificity of 95.6%. The percent correctly classified are 92.2%. At a cut-off of which flagged the 30% most risky, the sensitivity is 77.0% and the specificity is 69.5%.
Conclusions and Implications: As the field moves forward with an agenda of research to build the evidence base concerning child protection, it is crucial to develop a better understanding of opportunities to be increasingly strategic in our selection of children at high risk of abuse or neglect for referral to prevention programs, tailored to the service duration and dosage needs of newborns and their families, including selected high-intensity home visiting programs. In the current analysis, we demonstrate how universal data captured in vital birth records can be used to stratify newborns based on future risk of child protection involvement. We found that a simple model using 13 risk factors derived from the birth record had good predictive power both within the cohort and across cohorts for both referrals and substantiation.