Methods: Using 2,467 nursing notes about 459 patients, labeled for five racial groups (Black, Brown, Asian, White, Indigenous), we adopt language as a sensor using several models that allowed us to detect racial biases. We used log-likelihood ratio (LLR), the logarithm of the ratio of the probability of the word’s occurrence in different classes, to differentiate between different classes while also addressing imbalances in the size of the data within each of the categories. Next, we finetuned the Portuguese Clinical and Biomedical BERT on our dataset and then used the model to predict the racial classes using the language in nursing notes. The classifier (accuracy = 0.86; precision = 0.88; recall = 0.86; F1=0.85) was used to better understand biases in nursing notes. We then used SHapley Additive exPlanations post-hoc analysis to explain the output of the classifier.
Results: We found that phrases like “urological donor,” “hospitalizations,” “presenting months and during periods under nursing care,” and “absent intestinal eliminations; continue nursing care” were more predictive of Brown patients. This might indicate worse outcomes for Brown patients given that many of the phrases evince long-term worse health outcomes (e.g., edema). One of the more interesting findings for predictors of notes about Black patients was the use of “non-compliance” to indicate a lack of adherence to medical regimens. Further qualitative comparisons across other racial groups showed that, when patients in these groups did not follow their health regimen (e.g., decreasing salt/sugar intake), non-compliance was not used to describe such failure. Rather, they were explained by contextualizing their activities to explain their deviance from healthier living standards. This might indicate underlying bias in nursing notes about Black patients.
Conclusion and Implications: Our results show that there are significant linguistic differences between racial groups in nursing notes from Brazil. This in turn raises the potential for any AI-powered assistant technologies introduced in Brazilian medical systems to amplify underlying racial biases. The development of LLM-based AI assistants needs to account for these biases to ensure health equity. Future work should extend these results by, in collaboration with medical staff in Brazil, debiasing LLM models and testing prototype LLM assistants in situ to determine if they can support medical staff in a way that does not bias medical service provision to or against any racial/ethnic group in Brazil.