Methods: Our methodology involves using administrative data from the Lisungi Social Safety Net Database to geolocate households and establish a household-level Multidimensional Poverty Index (MPI) as ground truth poverty. We collate a range of intuitive image, text, and geographic features that include infrastructure data (from the Congolese government and OpenStreetMap), vegetation indices, nighttime lights from satellite images, Twitter count and topic weights from BERT topic models, and internet connectivity data. These variables serve as predictors in various high-performing ML models (ensemble, Bayesian, neural networks, etc.) to predict MPI values. The effectiveness of these machine learning-based targeting (MLT) methods is assessed using mean squared errors (MSE), targeting error rates (TER), and their simulated poverty reducing effects (using P0=headcount; P1=gap; P2=severity).
Results:
Consistent with the literature on targeting in Sub-Saharan Africa, PMT alone has a mixed performance, and CBT alone performs poorly in reaching households with low levels of well-being. Traditional targeting methods, when leveraging family size information, result in slightly better performance in the case of PMT (TER=23.8%; P0=21.51; P1=3.00; P2=0.90) but not for CBT ( TER=49.2%; P0=13.83; P1=2.08; P2=0.68). On the other hand, adding a wide range of new features to PMT substantially improved prediction accuracy at the household level. Our data-augmented MTL outperforms status quo targeting mechanisms in Congo as well as global prediction models in the literature, which is too coarse. The best MLT by ML standards (neural network R2=0.709, test MSE= 0.009) did not identify deep poverty well (P0=22.43; P1=2.95; P2=0.89). Instead, the eXtreme gradient boosting algorithm, using all spatial features except daytime imagery (R2=0.703, Test MSE = 0.010), resulted in the lowest targeting error (TER=7.4%) and largest poverty reduction (P0=27.65; P1=3.21; P2=0.93), close approximating hypothetical perfect targeting (P0=29.49; P1=3.27; and P2=0.94).
Conclusions & Implications: Our study finds that augmenting MLT models with multimodal spatial data can substantially improve micro-level poverty targeting compared to traditional methods. This study suggests the potential of building data infrastructure and adopting holistic evaluation metrics to promote more inclusive social welfare programs. The robust performance of our model, even in spatially homogenous settings, suggests the scalability of AI/ML models across large regions with greater spatial variation. Our focus on countries and populations marginalized in global development discourse promotes data justice.