Methods: Historical HR data were obtained from two community mental health centers (urban and rural areas). The first center contained 654 employee records, dating back to 2011 (336 cases left and 318 cases stayed at the time of data extraction), and the second center contained 894 employee records, dating back to 2017 (487 cases left and 407 cases stayed). The extracted HR data included age, gender, race, education level, marital status, exempt status, job type, position type, wage, past work years, work hours, job training hours, and characteristics of clients served by the employees (e.g., age, gender, mental health diagnosis). ML approaches with random forest and Lasso regression as training models were applied for predicting an employee’s turnover probability within the following 12 months. Missing data were imputed by the k-nearest neighbor method. Five-fold cross-validation approaches were used to evaluate the performance with the following measures: overall prediction accuracy, specificity, sensitivity, and area under the curve (AUC). The variable importance measures were also calculated to facilitate the selection of important turnover predictors.
Results: The results suggested a good level of turnover predictive accuracy, particularly with the random forest model (e.g., AUC > .8; above/close to .8 prediction accuracy) for both centers. The study also found that the ML methods could identify several important predictors (e.g., past work years, wage, work hours, age, job position, job training hours, and marital status) for turnover using historical HR data. The HR data extraction processes for ML applications were also evaluated as feasible.
Conclusions and implications: There are large historical data in HR data management systems that are often cited as reliable turnover predictors in the literature; however, such data are not always used to predict employee turnover. As ML applications to HR data are accumulated across organizations, it may be expected that some findings (e.g., predictors, predictive patterns, turnover mechanisms) might be more generalizable across different organizations (that can contribute to broader policy and workforce development efforts) while others may be more organization-specific (that can help HR and leadership for employee job retention at their organization). The current study provides new insights and avenues to address a data-driven, evidence-based turnover prediction strategy using existing HR data that are often under-utilized. Implications on how social work organizational leaders can incorporate data-driven decision-making to support employee retention will be discussed.