Schedule:
Saturday, January 13, 2024: 4:00 PM-5:30 PM
Capitol, ML 4 (Marriott Marquis Washington DC)
Cluster:
Organizer:
Emily Miller, MSSA, Case Western Reserve University, Mandel School of Applied Social Sciences
Speakers/Presenters:
Kari O'Donnell, MA, Case Western Reserve University,
Anna Bender, PhD, University of Washington and
Megan Holmes, PhD, Case Western Reserve University
Background & Purpose: In an era of digital data collection, one area of particular concern for social science researchers is bot-generated data (i.e., automatic survey-takers). Since bots are programmed toâââ‰â¬ï¿½but not always capable ofâââ‰â¬ï¿½provide a realistic representation of human behavior, datasets that contain data generated by bots will be of low quality and have biases. However, this data may be difficult to identify and systematically remove, leading to erroneous conclusions based on the bot-infected data due to the increased likelihood of Type I and Type II errors. Ultimately, these errors can lead to actual human harm if policy, program, and/or practice decisions are made from these data. In addition, bot-generated data presents ethical issues. For example, bot-generated data can be used to skew data or train algorithms, leading to unfair or discriminatory findings. Utilizing real data collected that was bot-infested, this workshop will train attendees on identifying bots, determining how to proactively avoid or handle bots, and cleaning bot-infected data. In sum, this workshop will focus on data collection, data cleaning, and study design through an overview of bot-generated data and specific training of strategies on identification, response, and prevention.
Add to the Current Knowledge: This workshop will enhance attendees' understanding of identifying and addressing bot-generated responses in datasets. While prevention is the best policy for dealing with bots, social science researchers must be able to identify and respond to data generated by bots. Learning objectives include 1) bot-generated data identification, 2) tools and strategies for dealing with bot-generated data, and 3) bot-generated data prevention strategies. Discussions, hands-on activities, and step-by-step instructions will accomplish these learning objectives.
Implications: With the ease and utility of online surveys, bot-generated data will continue to be an issue for social science researchers. Bots continue to rapidly improve and learn, evidenced by bot-generated qualitative responses generating more humanlike responses. Bots and their developers will continue to thwart attempts to prevent collecting bot-generated data (e.g., CAPTCHA questions, IP address screening, and survey duration time). Bot-generated data threatens the reliability and validity of data and may lead to improper or harmful decision-making in social work policy and practice. Vigilance around data integrity is imperative for social science research, and designing studies to avoid and address bot-generated data is crucial. Future research should take care to build in data screening for and plans to address bot-generated data.