Abstract: Digital Pulse of Development: Constructing Poverty Metrics from Social Media Discourse (Society for Social Work and Research 30th Annual Conference Anniversary)

Digital Pulse of Development: Constructing Poverty Metrics from Social Media Discourse

Schedule:
Friday, January 16, 2026
Capitol, ML 4 (Marriott Marquis Washington DC)
* noted as presenting author
Woojin Jung, PhD, Assistant Professor, Rutgers University, New Brunswick, NJ
Background and Purpose: Addressing poverty is a central goal in social work. However, traditional poverty measurement tools often prove insufficient in low-resource environments due to limited frequency, scale, and timeliness. This paper addresses those limitations by integrating natural language processing, machine learning, and spatial interpolation techniques to extract development indicators from citizen-generated content on social media in Zambia and address the following research questions: (1) How can topic model features, trained on social media data, reveal regional and socioeconomic variation? (2) To what extent can these features accurately predict village-level wealth? (3) What are the best interpolation methods to model missing spatial social media data in developing countries?

Methods: Using data from over 20,000 tweets collected between 2019 and 2021 and geolocated within a 10km radius of nationally representative Demographic and Health Survey village clusters, we trained a contextualized topic model (BERTopic) to extract over 100 latent themes. These were subsequently coded by domain experts to identify seven key development-related topics, including election corruption, food systems, mining, and public health challenges.

Results: Our analysis finds clear geographic and temporal patterns in topic salience. Wealthier areas tend to produce more tweets and engage in abstract, macro-level policy discussions, while poorer villages are more likely to focus on tangible, local development issues such as food and government services. Regional topic distributions align with known economic activities, such as mining discourse being prominent in mineral-rich provinces. A change-point detection algorithm identifies significant temporal shifts in discourse, including a spike around March 2020 related to the COVID-19 pandemic and geothermal development projects. Importantly, our models demonstrate that a small set of interpretable linguistic features explains over 60% of the variation in village-level wealth, outperforming or matching conventional proxies like nightlight intensity, building density, and vegetation indices. Explainable AI tools (e.g., Shapley values, partial dependence plots) further clarify how individual discourse themes contribute to model predictions, offering actionable insights into local development priorities. To address the common issue of data sparsity in rural regions, we evaluate several imputation methods, including median-filling, zero imputation, and kriging-based spatial interpolation. Gaussian kriging significantly improves predictive accuracy while offering a principled framework for uncertainty estimation. This enables more adaptive sampling strategies for future data collection in underrepresented areas.

Conclusions and Implications: This study contributes to the emerging field of digital development metrics by illustrating how social media can act as a participatory sensing system. Our findings not only provide actionable insights for targeting poverty and development planning in Zambia but also offer a scalable framework for real-time, citizen-driven monitoring across the Global South to provide poverty-related insights for social work research. Using “citizens as sensors” we will propose a novel pipeline, building an informational infrastructure and encouraging participatory and inclusive development by leveraging social media discourse.