Eviction research is important because it displaces low-income families, reduces future housing options, and can lead to homelessness. Eviction is largely the result of missed or nonpayment for their rent. For example, 93 percent of filings in Washington DC, 86.5 percent in Seattle, WA, and 90 percent in Cleveland, OH, are estimated due to no-payment. However, the causes of the miss or nonpayment are not well explored since many states have not formally collected these data into accessible databases, leaving the bulk of information dormant within individual case records, especially court files. Our research question asks if there are any efficient methods to explore the causes of nonpayment and even eviction? This first-of-its-kind study explores ways to categorize the reasons for nonpayment and amounts by using natural language processing to mine important information from court records.
Methods: Our primary dataset consists of PDFs of eviction court records from Pierce County, WA in 2017 (N=3,231). A proportion of these files have defendant answers explaining the circumstances for their eviction, providing a wealth of information about employment, family structure and situations, and life events. We first digitize these either handwritten or typed responses using Optical Character Recognition(OCR) provided by the Amazon Web Service Textract function. After converting the information in PDF files into text data, we apply "sentenceBERT'', a state of art natural language processing technique to convert the text into vectors for cluster analysis and visualization. We then use Spectral Clustering to ask the machine to return a reasonable number of clusters and explore the keywords in each cluster.
Results:
18.3%(N=591) of the total cases include answer or response files, meaning that most cases do not have responses to the eviction resulting in either default judgment or a very brief hearing in front of a judge. Structural differences between files, such as which page the response is located, reduce our sample (N=453). After exploring these answers, about 25% of answers are willing to pay the rent and late fees. However, they would like to have an extension or payment plan. Around 18% of the answers expressed the challenge of their income, like job loss or changes in social benefits. Close to 15% of the answers show disputes with the landlord and decide to hold the payment in protest or seek other solutions.
Conclusions and Implications:
This is a pilot test to explore the causes of nonpayment in eviction cases. We conclude that applying these advanced NLP techniques is an effective way to explore text-heavy data and provide valuable insight behind the mechanisms of different eviction filings. This method is also easy to scale up when obtaining more case files. This can help advance research in significant ways by combining demographic estimation to understand who is facing which mechanism most, understand the spatial structure of different eviction reasons, and explore tenant/landlord relationships and causes for eviction. By understanding the causes of nonpayment, researchers in social welfare can better allocate resources to assist tenants under eviction crises and prevent potential homelessness.