Methods: Cases (n=1,306) were reviewed and assigned flags. Descriptive characteristics and aggregate combinations of flag-distributions were used to consolidate flags and sort cases into two groups: “likely bots” (n=402) and “likely human” (n=904). Mahalanobis distance was assessed among viable cases (n=1132); raw scores underwent log transformation, and an independent samples t-test. Person-total correlation was assessed among viable cases (n=1286). Data characteristics between groups were compared.
Results: The independent samples t-test indicates the Mahalanobis distance for the "likely human" group (n=799, M=1.26, SD=0.38) was significantly greater than the "likely bot" group (n=333, M=1.12, SD = 0.50), t (1130) = 4.084, p < 0.001. Effect size was moderate (r = 0.32). There was no significant difference between groups detected by person-total correlation. Mahalanobis distance findings suggest “likely human” cases demonstrate more variability than cases deemed “likely bots.” This contrasts with modeled demonstrations of bot-detection using Mahalanobis distance, which usually identifies verifiable bots as multivariate outliers among human responses. Person-total correlation did not identify any group differences, unlike modeled research. Problems with group-level distinctions and unequal groups likely influenced analysis.
Conclusions and Implications: The greatest limitation of this study is that the actual proportion of human and bot cases is unknown; as such, the true accuracy of the presented methods cannot be definitively assessed. This circumstance is precisely what makes this study so valuable – when real data is corrupted by bots, researchers have no means of knowing which cases are legitimate and which are corrupt, and as is demonstrated here, even the most concrete, mathematical means of identifying bot-corrupted responses may have significant limitations when translated under real-life circumstances. Data integrity is an ethical imperative. More resources, discussion, and cross-disciplinary research will be necessary to work toward establishing an institutional standard for preventing, identifying, and accounting for sophisticated bot activity in online research. A long-term solution may involve a statistical mechanism that effectively accounts for what may become a rather inevitable level of bot-influence within online datasets.