Methods: Data were collected using a cross-section survey design, fielded between May 2020 and September 2020. The survey was infiltrated by bots early during the fielding. Two methods were utilized for identifying bot-generated data: (1) a manual data evaluation technique developed by the authors and (2) an algorithm developed by Ilagan and Falk (2023). The final classification of bot-generated and human data was determined based on flagging through either the manual evaluation or the algorithm, resulting in a dichotomous variable indicating bot-generated or human data. The final determination resulted in 43.43% of the sample being human (n = 277). To develop a robust understanding of the impact of bot-generated data, several kinds of statistical testing were conducted, including t-test, Chi-square, correlation, and hierarchical regression analyses.
Results: Findings suggest significant differences between bot-generated and human data on the Conflict Tactics Scale (CTS); t = -6.70, p < 0.000) and gender (X2 = 37.06, p < 0.000). The bot-generated data indicator had a significant relationship to the CTS score and gender (p < 0.000). The initial regression model of age, race, and gender significantly predicted CTS score, explaining 8% of the variance. The second model, including the dichotomous bot-generated data indicator, improved the model, which accounted for 15.99% of the model’s variance. A positive bot-generated data indicator was associated with a 62.91-point increase in the CTS score.
Conclusions and Implications: Bot-generated data undermines the integrity of online survey research. Our findings indicate that bot-generated data leads to differences in research outcomes. Bot-generated data was significantly different across both continuous and categorical variables. When assessing for differences, bot-generated data were linked to significant increases in CTS scores, and the explanatory power of the model improved. While strategies exist for avoiding bot-generated data, they are not infallible and addressing bot-generated data must be considered in research designs. Data quality assessments and statistical models need to be adjusted to reflect bot-generated data. Future research should continue to assess and address the impact of bot-generated data in studies.
Citations
Ilagan, M. J. and Falk, C. F. (2023). Model-agnostic unsupervised detection of bots in a likert-type questionnaire. Behavior Research Methods, 56(5), 5068-5085. https://doi.org/10.3758/s13428-023-02246-7
![[ Visit Client Website ]](images/banner.gif)