Abstract: Evaluating the Distortion of Research Results By Bot-Generated Data (Society for Social Work and Research 30th Annual Conference Anniversary)

Evaluating the Distortion of Research Results By Bot-Generated Data

Schedule:
Friday, January 16, 2026
Congress, ML 4 (Marriott Marquis Washington DC)
* noted as presenting author
Kari O'Donnell, PhD, Research Assistant, Case Western Reserve University, OH
Emily Miller, PhD, Research Affiliate, Case Western Reserve University, Cleveland, OH
Anna Bender, PhD, Postdoctoral Fellow, University of Washington
Megan Holmes, PhD, Associate Professor, Jack, Joseph and Morton Mandel School of Applied Social Sciences, Case Western Reserve University
Background/Purpose: Bot-generated data poses a significant threat to online survey research through direct interference and by increasing the burden on respondents and researchers alike as researchers strive to navigate around bots. This issue is particularly troubling because research often overlooks the existence of bots, and when acknowledged, researchers are often unsure how to effectively address them. However, simply ignoring survey bots threatens data quality and trust in findings. Utilizing a simple model assessing how age, race, and gender predict Conflict Tactics Scale (CTS) scores, this study explores the impact of bot-generated data as compared to human data on research results. We predict bot-generated data will result in significant results when assessing mean and categorical differences and estimating the relationship between variables.

Methods: Data were collected using a cross-section survey design, fielded between May 2020 and September 2020. The survey was infiltrated by bots early during the fielding. Two methods were utilized for identifying bot-generated data: (1) a manual data evaluation technique developed by the authors and (2) an algorithm developed by Ilagan and Falk (2023). The final classification of bot-generated and human data was determined based on flagging through either the manual evaluation or the algorithm, resulting in a dichotomous variable indicating bot-generated or human data. The final determination resulted in 43.43% of the sample being human (n = 277). To develop a robust understanding of the impact of bot-generated data, several kinds of statistical testing were conducted, including t-test, Chi-square, correlation, and hierarchical regression analyses.

Results: Findings suggest significant differences between bot-generated and human data on the Conflict Tactics Scale (CTS); t = -6.70, p < 0.000) and gender (X2 = 37.06, p < 0.000). The bot-generated data indicator had a significant relationship to the CTS score and gender (p < 0.000). The initial regression model of age, race, and gender significantly predicted CTS score, explaining 8% of the variance. The second model, including the dichotomous bot-generated data indicator, improved the model, which accounted for 15.99% of the model’s variance. A positive bot-generated data indicator was associated with a 62.91-point increase in the CTS score.

Conclusions and Implications: Bot-generated data undermines the integrity of online survey research. Our findings indicate that bot-generated data leads to differences in research outcomes. Bot-generated data was significantly different across both continuous and categorical variables. When assessing for differences, bot-generated data were linked to significant increases in CTS scores, and the explanatory power of the model improved. While strategies exist for avoiding bot-generated data, they are not infallible and addressing bot-generated data must be considered in research designs. Data quality assessments and statistical models need to be adjusted to reflect bot-generated data. Future research should continue to assess and address the impact of bot-generated data in studies.

Citations

Ilagan, M. J. and Falk, C. F. (2023). Model-agnostic unsupervised detection of bots in a likert-type questionnaire. Behavior Research Methods, 56(5), 5068-5085. https://doi.org/10.3758/s13428-023-02246-7