Methods: Using data from over 20,000 tweets collected between 2019 and 2021 and geolocated within a 10km radius of nationally representative Demographic and Health Survey village clusters, we trained a contextualized topic model (BERTopic) to extract over 100 latent themes. These were subsequently coded by domain experts to identify seven key development-related topics, including election corruption, food systems, mining, and public health challenges.
Results: Our analysis finds clear geographic and temporal patterns in topic salience. Wealthier areas tend to produce more tweets and engage in abstract, macro-level policy discussions, while poorer villages are more likely to focus on tangible, local development issues such as food and government services. Regional topic distributions align with known economic activities, such as mining discourse being prominent in mineral-rich provinces. A change-point detection algorithm identifies significant temporal shifts in discourse, including a spike around March 2020 related to the COVID-19 pandemic and geothermal development projects. Importantly, our models demonstrate that a small set of interpretable linguistic features explains over 60% of the variation in village-level wealth, outperforming or matching conventional proxies like nightlight intensity, building density, and vegetation indices. Explainable AI tools (e.g., Shapley values, partial dependence plots) further clarify how individual discourse themes contribute to model predictions, offering actionable insights into local development priorities. To address the common issue of data sparsity in rural regions, we evaluate several imputation methods, including median-filling, zero imputation, and kriging-based spatial interpolation. Gaussian kriging significantly improves predictive accuracy while offering a principled framework for uncertainty estimation. This enables more adaptive sampling strategies for future data collection in underrepresented areas.
Conclusions and Implications: This study contributes to the emerging field of digital development metrics by illustrating how social media can act as a participatory sensing system. Our findings not only provide actionable insights for targeting poverty and development planning in Zambia but also offer a scalable framework for real-time, citizen-driven monitoring across the Global South to provide poverty-related insights for social work research. Using “citizens as sensors” we will propose a novel pipeline, building an informational infrastructure and encouraging participatory and inclusive development by leveraging social media discourse.
![[ Visit Client Website ]](images/banner.gif)