Methods: We collected a social media corpus from users in Chicago, IL with Twitter connections to and engagements with a seed user and her top communicators. The seed user was a self-identified gang member with a large Twitter following. We hired formerly gang-involved youth as domain experts to inform the training of social work graduate student annotators. Our annotators used a qualitative approach to interpret context in social media data, and label each post for the training of NLP, CV, and ML systems. These annotations and labels are validated and evaluated by our domain experts. Some of our NLP features are optimized for the prediction of the loss, aggression, and substance use codes of the tweets. Part of our CV system is trained to detect local visual concepts related to gang activities: general (person, money, location), firearms (handgun, long gun), drugs (lean, joint, marijuana), and gang affiliation (hand gesture, tattoo). Our ML model predicts loss, aggression, and substance use based on generic image features, detected visual concepts, linguistic features, and the trained NLP features.
Results: We present an interdisciplinary process between social work and data science involving contextual analysis and labeling of social media data to inform NLP, CV, and ML tools to build a contextual understanding of online communications in communities with high rates of violence. Our NLP approach performs very well at detecting loss, with a F-measure of 0.71 and an Average Precision (AP) of 0.81. Our CV system performs better for the aggression and substance use codes, with F-measure of 0.55 and 0.50, and AP of 0.55 and 0.51 respectively. Combining the NLP and CV system achieves the best mean AP (0.60) across the three codes, while a single modality (i.e. text or image) best performance is 0.51, showing the need for a multimodal analysis to achieve higher performance.
Conclusions and Implications: Our interdisciplinary approach helped us to become aware of the complexity of interpreting context in social media posts, including the broader context of our work and important ethical considerations around social media use in research and violence prevention. This work has implications for violence interruption and prevention, grief and trauma-informed approaches to violence intervention, and police surveillance and interpretation of social media.