Abstract: Analysis of Case Notes to Provide Insights into Racial Inequality and Patient Outcomes in Brazilian Hospitals (Society for Social Work and Research 29th Annual Conference)

Schedule:

Thursday, January 16, 2025

Jefferson A, Level 4 (Sheraton Grand Seattle)

* noted as presenting author

Tawfiq Ammari, PhD, Assistant Professor, Rutgers University, NJ

Charles Senteio, Associate Professor, Rutgers University, NJ

Priscila Ferreira, Adjunct Professor and International Relations Coordinator, Universidade Federal do Rio de Janeiro, Brazil

Background and Purpose: Over the past two years, there has been an intense interest in deploying large language models (LLMs) like chatGPT, BERT, Llama, and others in healthcare settings. Given the inequities in the health field and the fact that machine learning models like LLMs amplify societal biases, it is important to study biases in clinicians’ case notes and other EHRs which would be used in training these models to determine those biases. Detecting said biases allows us to account for them, and debias data when the need arises. In this study, we use machine learning techniques to detect linguistic signals unearthing any biases between racial/ethnic groups, using case note data from nurses in Brazilian hospitals.

Methods: Using 2,467 nursing notes about 459 patients, labeled for five racial groups (Black, Brown, Asian, White, Indigenous), we adopt language as a sensor using several models that allowed us to detect racial biases. We used log-likelihood ratio (LLR), the logarithm of the ratio of the probability of the word’s occurrence in different classes, to differentiate between different classes while also addressing imbalances in the size of the data within each of the categories. Next, we finetuned the Portuguese Clinical and Biomedical BERT on our dataset and then used the model to predict the racial classes using the language in nursing notes. The classifier (accuracy = 0.86; precision = 0.88; recall = 0.86; F1=0.85) was used to better understand biases in nursing notes. We then used SHapley Additive exPlanations post-hoc analysis to explain the output of the classifier.

Results: We found that phrases like “urological donor,” “hospitalizations,” “presenting months and during periods under nursing care,” and “absent intestinal eliminations; continue nursing care” were more predictive of Brown patients. This might indicate worse outcomes for Brown patients given that many of the phrases evince long-term worse health outcomes (e.g., edema). One of the more interesting findings for predictors of notes about Black patients was the use of “non-compliance” to indicate a lack of adherence to medical regimens. Further qualitative comparisons across other racial groups showed that, when patients in these groups did not follow their health regimen (e.g., decreasing salt/sugar intake), non-compliance was not used to describe such failure. Rather, they were explained by contextualizing their activities to explain their deviance from healthier living standards. This might indicate underlying bias in nursing notes about Black patients.

Conclusion and Implications: Our results show that there are significant linguistic differences between racial groups in nursing notes from Brazil. This in turn raises the potential for any AI-powered assistant technologies introduced in Brazilian medical systems to amplify underlying racial biases. The development of LLM-based AI assistants needs to account for these biases to ensure health equity. Future work should extend these results by, in collaboration with medical staff in Brazil, debiasing LLM models and testing prototype LLM assistants in situ to determine if they can support medical staff in a way that does not bias medical service provision to or against any racial/ethnic group in Brazil.