Abstract: The Machine in Propensity Scores: Comparison of Machine-Learning Models and Logistic Regression in Propensity Score Matching (Society for Social Work and Research 29th Annual Conference)

Please note schedule is subject to change. All in-person and virtual presentations are in Pacific Time Zone (PST).

The Machine in Propensity Scores: Comparison of Machine-Learning Models and Logistic Regression in Propensity Score Matching

Schedule:
Friday, January 17, 2025
Redwood A, Level 2 (Sheraton Grand Seattle)
* noted as presenting author
Khudodod Khudododov, PhD, Research Project Manager, Rutgers University
Background/Purpose: Few would argue about the contribution propensity score analysis has made in quasi-experimental studies. Rosenbaum and Rubin’s article (1983) on the application of propensity scores in the estimation of treatments effects in observational studies fueled future scholars with insights into the analysis of causal effects. Since then, a multitude of studies have been published, employing propensity scores as a foundation of their analysis.

At the same time, just as propensity score models are indispensable in quasi-experimental studies, so is logistic regression in propensity score analysis. Most studies that perform propensity score analysis do so using logistic regression. Weitzen and colleagues (2004) found that 98% of medical journal articles reviewed used a logit or probit function to estimate propensity score.

The proliferation of machine learning models however has started to shift this balance. The appeal of incorporating machine learning models in the estimation of propensity scores lies in discerning complex patterns, their versatility in handling copious quantities of variables as well as large datasets, and less reliance on the functional form of the data generating process. Consequently, it is argued to result in a better and more accurate estimate (Setoguchi et al., 2008: Lee et al., 2010; McCaffrey et al., 2004).

The current study investigates the efficacy of three models in propensity score estimation: a conventional logistic regression alongside two widely known machine learning models, namely a random forest and a gradient boosted machine. Through comprehensive comparative analysis, this research aims to unveil the strengths and limitations of each approach, providing valuable insights for researchers navigating the terrain of treatment effect estimation in observational studies.

Methods: This study used the Baccalaureate & Beyond, a longitudinal dataset of the 2008/09 U.S. college graduating cohort who were followed 4-years after. The National Center for Education Statistics (NCES) provided access to the data. The sample included all the cohort members who graduated with a bachelor’s degree in 2008 and enrolled in 4-year degree program.

The researcher used various demographics characteristics, admission scores, high school and early college performance as pre-treatment variables in estimating the propensity scores. Graduates with STEM or non-STEM majors was the treatment variable with STEM being a treatment and non-STEM a comparison group.

Each of the three models used the same independent variables. The assessment of the model included both a numeric approach using the confusion matrix and a graphical approach using ROC. Standardized mean difference was used to assess the balance on covariates.

Result: Results showed a mixed finding. While machine-learning models performed better in prediction accuracy, logistic model showed almost similar results. Random forest model also performed better in covariate balance; however, it did worsen balance for certain covariates such as race/ethnicity.

Conclusion: This analysis contributes to the literature on the use of machine learning models and provides evidence on the application of the use of such models in future studies.