A sizeable body of research has focused on the development and testing of individual-level predictive models of homelessness. Findings from these studies are useful in terms of understanding how to accurately target homelessness prevention intervention to individuals and families who face the highest risk of homelessness. However, while numerous studies have identified community-level housing market, economic and demographic correlates of homelessness, there has been no attempt to date to assess how accurately these set of factors can predict changes in the extent of homelessness at the community-level. This is a notable gap as the ability to forecast community-level rates of homelessness carries with it the potential for more accurately targeting resources to communities with greatest anticipated need. Thus, to address this gap, this study sought to develop and test a community-level predictive model of homelessness.
Methods
We leverage the U.S. Department of Housing and Urban Development’s Point-in-Time Count data to construct a dataset that captures the total number of people experiencing homelessness in a set of 384 communities in each year from 2007-2023. We randomly split this dataset into a training set, which is used in the development of our predictive models and a test set, which is used to assess the performance of these models. Our key outcome in these predictive models is the total rate of homelessness. Because a key goal of our study is to develop models that can be easily reconstructed and updated using publicly available data, we use data from the American Community Survey to construct the following small set of community-level predictor variables: median rent for a 2-bedroom apartment, median household income, poverty rate, rental vacancy rate, unemployment rate, share of all households that are renters, and proportion of renter households who are cost-burdened. We compare the performance of predictive models using several approaches including linear regression and several machine learning algorithms (random forest, histogram-gradient boosting, extreme gradient boosting, and neural networks). We also estimate additional models using the total rate of homelessness among individuals, persons in families with children, as well as those experiencing sheltered and unsheltered homelessness, as our outcomes of interest.
Results
Using R-squared as a metric of model accuracy, model performance for all models was considered “moderate” (R-squared > 0.5) or “good” (R-squared > 0.7) by conventional benchmarks. Machine learning algorithms generally performed better than linear regression for all outcomes.
Conclusions and Implications:
This is the first study to our knowledge to develop and test predictive models of homelessness at the community level. The moderate to good performance of our models points to their potential utility in accurately identifying communities with economic and housing market conditions that may lead to increases in homelessness. More research is needed to refine these models and develop practices for using them in an applied context to target resource allocation.
![[ Visit Client Website ]](images/banner.gif)