Bridging Disciplinary Boundaries (January 11 - 14, 2007)


Pacific M (Hyatt Regency San Francisco)

The Reliability and Predictive Validity of a Consensus-Based, Child Welfare Risk Assessment Instrument

Aron Shlonsky, PhD, University of Toronto, Tara Black, MSW, University of Toronto, and James G. Barber, PhD, RMIT University.

Purpose:

Risk assessment tools in child welfare have been widely used as a means to augment case decision-making by providing estimates of the risk that parents will reabuse their children. However the choice of instrument employed, actuarial or consensus-based, has been hotly contested over the years. This study examines the reliability and predictive validity of Ontario's risk assessment tool, a 23-item consensus-based tool mandated for use across the province.

Methods:

For the reliability portion of the study, a stratified random sample of 132 cases receiving services between December 2000 and March 2003 was drawn from one of Ontario's largest children's aid societies. Initial risk scores for each of these cases were extracted from casefiles and compared with the scores assigned by three blind case readers, who read and rated each of the casefiles independently. Internal consistency among five pre-established categories was assessed using Cronbach's Alpha and Inter-rater reliability was assessed using Cohen's Kappa. Predictive validity was tested on 1,118 cases that were selected and administratively followed for varying lengths of time (most for at least 18 months). The criterion for predictive validity was the recurrence of child maltreatment after case closure.

Results:

Internal consistency was poor to fair (α's ranging from 0.53-0.74) and inter-rater reliability was greater than would be expected by chance alone in 8 of the 23 risk items. Despite somewhat limited overall reliability, tests of predictive validity were conducted to ascertain whether individual risk items were predictive of maltreatment recurrence. Survival analysis revealed mostly poor predictive capacity for individual items and no predictive capacity for caseworkers' subjective overall risk rating. Child behavior problems (HR=1.22; CI=1.08-1.37), Caregiver alcohol or drug use (HR=1.16; CI=1.02-1.33), and caregiver history of maltreatment (HR=1.21; CI=1.08, 1.37) had some degree of reliability and predictive validity.

Discussion:

The results of this study reinforce findings from psychology and criminal justice. Specifically, consensus-based risk assessment instruments, while often inclusive of categories that are clinically relevant, are often unreliable and invalid as predictors of risk. Beyond structuring clinical work around a core set of arguably important child welfare constructs, this particular tool offers little value in terms of classifying children and families into differing risk groups. As Ontario moves to a differential response system, it requires a tool with far greater predictive capacity in order to better target more intrusive protective services to the cases with the greatest degree of risk.