Schedule:
Friday, January 17, 2025: 8:00 AM-9:30 AM
Virgina, Level 4 (Sheraton Grand Seattle)
Cluster:
Organizer:
Emily Miller, PhD, Case Western Reserve University
Speakers/Presenters:
Kari O'Donnell, PhD, Case Western Reserve University,
Megan Holmes, PhD, Case Western Reserve University and
Anna Bender, PhD, University of Washington
Background & Purpose: In an era of digital data collection, data collection has become more accessible to more researchers, particularly those early in their careers and conducting doctoral research. However, with advancements in accessibility, data integrity has become a more serious issue. Unfortunately, how to handle and address fraudulent data is not taught. Instead, it comes with experience and practice, but it comes too late for new researchers. Online data collection is risky for several reasons, including bot-generated data (i.e., automatic survey-takers), professional survey-takers, and fraudulent respondents. These can result in inauthentic data, datasets of low quality, and high biases. However, for those who have experienced fraudulent data, it may be challenging to identify and systematically remove. This can lead to erroneous conclusions based on the bot-infected data due to the increased likelihood of Type I and Type II errors. Ultimately, these errors can lead to actual human harm if policy, program, or practice decisions are made from these data. Fraudulent data presents ethical issues. For example, fraudulent data can be used to skew data or train algorithms, leading to unfair or discriminatory findings. Utilizing real data collected that was riddled with fraudulent cases, this workshop will train attendees on identifying fraudulent cases, determining how to proactively avoid or handle fraudulent cases, and cleaning this data. In sum, this workshop will focus on data collection, data cleaning, and study design through an overview of fraudulent data and specific training in identification, response, and prevention strategies. Add to the Current Knowledge: This workshop will enhance attendees’ understanding of the pitfalls of online data collection by students. Prevention is the best policy for dealing with bots; social science researchers must be able to identify and respond to data generated by bots. Learning objectives include 1) bot-generated data prevention strategies when planning your study and conducting the study, 2) tools and strategies for dealing with bot-generated data, and 3) IRB language for handling bots. Discussions, hands-on activities, and step-by-step instructions will accomplish these learning objectives. Implications: With the ease and utility of online surveys, fraudulent data will continue to be an issue for social science researchers. Bots and fraudulent respondents continue to improve and learn rapidly. Bots and fraudulent respondents will continue to thwart attempts to prevent collecting fraudulent data (e.g., CAPTCHA questions, IP address screening, and survey duration time). Fraudulent data threatens the reliability and validity of data and may lead to improper or harmful decision-making in social work policy and practice. Vigilance around data integrity is imperative for social science research, and designing studies to avoid and address fraudulent data is a crucial step. Future research should take care to build in data screening for and plans to address fraudulent data.
See more of: Workshops