Objective: This study evaluated whether a Gradient-Boosted Bootstrap Model (GBBM), a hybrid machine learning technique combining bootstrapping and gradient boosting, could outperform logistic regression in predicting treatment completion among adults in SUD programs.
Methods: We analyzed data from 1,158 adults receiving treatment for substance use. Predictors included baseline demographic and drug use variables; missing values were imputed using median values, and categorical variables were dummy-coded. Ten gradient-boosted models were trained on bootstrapped samples, with predictions averaged across models. Performance was assessed using accuracy, precision, recall, F1-score, and ROC AUC on a held-out test set. Analyses were conducted using STATA v18 with Python integration.
Findings: GBBM outperformed logistic regression across most metrics: recall (0.875 vs. 0.836), precision (0.672 vs. 0.665), and F1-score (0.760 vs. 0.741), indicating improved ability to identify individuals who completed treatment. Both models performed similarly in identifying non-completers. Permutation importance analysis revealed key predictors of treatment success. The most influential were having disability income and no co-occurring mental health disorder, suggesting the importance of social stability and behavioral health complexity. Other influential factors included injection drug use, age of first use, and education level. Contextual variables such as unemployment, lack of income, and prescription drug sourcing also contributed meaningfully.
Conclusions and Implications: Machine learning models like GBBM may enhance recovery-oriented systems of care by identifying individuals at higher risk of dropout and informing tailored retention strategies. These methods can improve outcomes in substance use treatment, particularly in under-resourced settings, by guiding more equitable and data-driven allocation of services.
![[ Visit Client Website ]](images/banner.gif)