Predicting Methamphetamine Use of Homeless Youths Attending High School: Comparison of Decision Rules and Logistic Regression Classification Algorithms
In an effort to identify and treat school-attending homeless youths who use methamphetamines, social work practitioners are faced with a classification problem regarding who is or is not using meth. To address this type of problem, we propose an approach using decision rules to classify cases. We compare a decision rules model to a logistic regression model for classifying meth users and non-users with a large dataset of high-school-attending homeless youths. This study answers two questions: 1) What are the key predictors (inputs) associated with meth use among homeless youths attending high school? 2) For a given set of inputs, which of the two models (i.e., logistic regression or decision rules) was better at predicting meth use?
Methods:
Using the Rattle/R software, we analyzed data from the California Healthy Kids Survey (CHKS) dataset. A sub-set of 2,146 high-school-attending youths who identified as homeless was drawn from the full dataset. Measures for the independent variables/inputs (cigarette, alcohol, and marijuana use; partner abuse; high-risk sexual behavior; truancy; peer influence supporting drug use; low parental monitoring; gang involvement; criminal activity; depression; caring adult outside home; caring teacher at school; and trusting adult outside home/school) and dependent variable/output (youths who have tried meth at least once and youths who have not tried meth) were derived from the survey using the risk and resilience framework.
Results:
For the logistic regression model, in comparison to youths who had not tried marijuana, youths who had tried marijuana at least once had 6.5 times greater odds of also having tried meth at least once. Youths who had smoked a cigarette at least once had 2.6 times greater odds of having tried meth than youths who had not smoked a cigarette at least once. The same was true for those who had tried alcohol. Youths who were truant more than once a week had 2.5 times greater odds of also having tried meth than those who had never been truant. The decision rules model found two rules: 1) If a youth has tried marijuana at least once and is truant from school more than once a week, he/she is predicted to have tried meth at least once; and 2) If a youth has tried marijuana at least once, is not truant from school more than once a week, and has smoked a cigarette at least once, he/she is predicted to have tried meth at least once. The confidence interval for the decision rules model [.89, .93] had larger endpoints and was narrower than for the logistic regression model [.76, .84], which indicates the former was the stronger model.
Conclusions:
We identify three advantages of decision rules models: 1) they automatically explore data for multiple-way interactions; 2) they do not assume particular functional forms regarding how predictors are related to outcomes; and 3) they facilitate more intuitive interpretations than logistic regression. Findings highlight the utility of decision rules models as a complement to logistic regression and as a guide to making informed predictions and practice decisions about client outcomes.