Methods. Domain experts with clinical social work backgrounds developed a rubric for evaluating clinical note quality. The rubric was grounded in the literature on clinical documentation and person-centered care. It consisted of three domains: (1) readability, (2) clinical content, and (3) person-centeredness, which were defined for manual and computational assessment. Clinical notes from one community mental health center (CMHC) and AI-generated notes using the OpenAI GPT-4 model were scored manually using the rubric. The manual annotation by domain experts then served to train and validate the quality evaluation algorithm, which was refined for optimal performance. Using Python, we compared the word count, the number of large words (with more than 6 characters), and the frequency of n-grams using independent sample t-tests.
Results. When comparing the CMHC and AI-generated notes, several key differences were found. Differences in note length were marginally significant, with AI-generated notes having 8 more words (p = .08). AI-generated notes had 12 more large words (p < .001) compared to CMHC notes. They tended to use more technical medical language and negative emotion words, such as “hopelessness” and “overwhelmed”, compared to CMHC notes. Further, the AI-generated notes less frequently included descriptions of the client’s family compared to CMHC notes (p < .001). Finally, AI-generated notes were less likely to include individualized information and strength-based approaches than CMHC notes.
Conclusion and Implications. This study developed and piloted a rubric and an NLP algorithm to determine clinical note quality based on readability, clinical content, and person-centeredness. In applying this algorithm to both CMHC and AI-generated notes, preliminary findings suggest that it was able to detect differences in note quality across the three domains. The findings confirm that we can harness AI not only to generate clinical notes, but also to assess them for quality, if undergirded by human expertise. Future research should focus on larger samples and extend these methods to other settings. As AI plays an increasing role in the delivery of healthcare, ensuring quality control that is rooted in human clinical judgement is essential.