This study aims to develop a forecasting tool to identify youth at risk of attempting suicide and explore potential biases within algorithms related to race. The study objectives were to: (a) determine whether machine learning can be used to predict suicide attempts with and without suicidal ideation and plans using the Youth Risk Behavior Surveillance Survey (YRBS) national survey, (b) examine whether the prediction rates differ by race/ethnicity, and (c) evaluate the main variables impacting suicide attempts by race/ethnicity.
Methods: This study analyzed data from the YRBS collected in 2015, 2017, and 2019 in public/private schools in all 50 states and the District of Columbia. Survey respondents included a representative sample of 9th-12th grade students (n=32,377) attending public/private schools. Survey questions assessed the extent to which youth engaged in risk behaviors, including unintentional injury, tobacco or vapor use, alcohol, and other drug use, as well as youths’ experiences regarding suicide behaviors in the past year. Suicide attempt prediction and variable analyses were conducted using XGBoost and Shapley Additive Explanations.
Results: Eight percent of the 32,377 participants self-reported a suicide attempt. XGBoost produced the highest F1 measure with 80.84% against an independent test set when suicidal behaviors (ideation, plans) are present in the data. XGBoost also achieved positive predictive value and sensitivity scores of 73.80% and 88.35%, respectively. When a history of suicidal behaviors is not present, XGBoost still produced the highest F1 measure with 66.19% against an independent test set with positive predictive value and sensitivity scores of 57.16% and 78.70%, respectively. SHAP analysis showed that suicidal ideation and plans had the greatest significant impact on the model’s performance, followed by the ability to concentrate.
Conclusions and Implications: Machine learning analyses indicated the predictive accuracy of the model was strongest when a history of suicide behaviors was included, a finding observed across all racial/ethnic groups. A decrease in prediction performance occurred when suicide ideation and plans were removed from the model. The monitoring of these risk behaviors must be taken seriously for adolescents that engage in suicide attempts. When suicidal behaviors are absent, additional data sources that capture risk behaviors are needed. To prevent suicide deaths in youth populations, future research on assessing suicide risk should incorporate supplementary data sources like Electronic Health Records that capture other risk behaviors linked to suicidal behavior.