Schedule:
Friday, January 17, 2025: 9:45 AM-11:15 AM
Virgina, Level 4 (Sheraton Grand Seattle)
Cluster:
Organizer:
Nari Yoo, MA, New York University
Speakers/Presenters:
Nari Yoo, MA, New York University,
Cheng Ren, PhD, State University of New York at Albany and
Gaurav Sinha, PhD, University of Georgia
Social media discussions offer an unobtrusive opportunity to study real-time human behavior, revealing insights into moods, emotions, and sentiments about life experiences. With recent technical innovations and interdisciplinary collaboration, social work has expanded its research on those text-heavy social media. Reddit, one of the most popular social media platforms for posting, commenting, and voting on topics, provides social work researchers with access to large-scale, timely data from vulnerable populations. This is particularly significant in the context of exploring innovative or emerging areas within social work practice, policy, or research. Some recent social work research examples, utilizing Reddit data, span topics from foster care to the student loan debate. It demonstrates the value of Reddit data in capturing detailed, qualitative insights that surpass those available through other short-form social media data (e.g., Twitter). Specifically, mental health subreddits serve not only as avenues for online help-seeking but also as peer support groups, thus providing a wealth of information that can enhance our understanding of these issues. In this workshop, we will introduce two natural language processing (NLP) methods that can help researchers gain a deeper understanding of Reddit data: sentiment analysis and topic modeling. Sentiment analysis is the task of assigning a sentiment or emotion label to a text, such as anger, joy, or sadness. Topic modeling is the task of discovering the themes in a collection of texts. We will use state-of-the-art NLP models from HuggingFace and BERTopic libraries to perform these tasks on Reddit data, such as subreddit r/suicidewatch. Google Colab (free and browser-based coding platform), will be used to run the analyses. The workshop will consist of three parts:
Part 1: Introduction to Reddit data and NLP methods. We will provide an overview of Reddit data, its structure, and its potential applications for social work research with examples. Further, we will introduce how to use the historical Reddit data using Pushshift API. We will also explain the basic concepts and techniques of NLP, such as word embeddings, transformers, and BERT.
Part 2: Sentiment analysis and depression detection with HuggingFace. We will also discuss how to use the pre-trained models for specific domains and tasks, and how to interpret the predictions and probabilities.
Part 3: Topic modeling with BERTopic. We will explore different parameters, such as number of topics, topic reduction, and topic representation. We will also discuss how to visualize and evaluate the topics.
To demystify the computational methods, the workshop will be interactive and hands-on, with code examples and exercises for participants to follow along. The workshop will provide references and resources for participants who want to learn more about Reddit data and NLP methods. The workshop will be suitable for researchers who have some familiarity with programming language and basic statistics, but no prior experience with Reddit data or NLP methods is required. The workshop will aim to equip participants with the skills and knowledge to apply NLP methods to Reddit and other social media data for social work research and finally contribute to evidence-based practice.