Methods: This symposium will include three separate studies. In the first study, the researcher will describe how data mining was used to explore the importance of psychological distress and gender in predicting hypertension using a sample of over 600,000 adults from the National Health Interview Survey. In the second study, the researcher will present an effort to use data mining to explore relationships between over twenty predictor variables and school enrollment for a sample of 4,000 youth who were aging out of foster care, derived from the National Youth in Transition Database. The third study will present an ethical framework for understanding, developing, implementing and evaluating the use of algorithmic decision-making in human service provision, and apply the framework using a case example in child welfare.
Results: The results of all three studies in this symposium provide evidence that data mining, cluster analysis, neural networks, and other data science methods can help discern patterns and relationships in messy, complex, large data. For example, decision-tree models identify local interactions between variables that may not be evident using logistic regression, and predictive models in general may be refined over time, as more observations or variables become available. However, this symposium will also present evidence that the development of decision-making algorithms can contain flaws, requiring researchers and practitioners to have a firm understanding of ethics and algorithm development to ensure fairness, accountability and transparency.
Conclusions/Implications: Social work is significantly behind other related fields, such as public health, in applying data science methods to solve social problems. It will soon be necessary for social work researchers to become familiar with data mining and machine learning approaches, because government and private agencies will be using these methods to make important decisions related to human services and social policy. Data science methods will likely complement, but not replace, other quantitative methods in the near future, although these methods also have limitations, such as the potential for overfitting data, identifying spurious relationships, and algorithmic bias, which may impact access to supportive programs for disadvantaged social groups.->