Methods. We fine-tuned two LLMs (ChatGPT-3.5 and LLaMA7B) with 12 years of data from r/Anxiety, comprising top-level prompt-response pairs that indicate supportive interactions among the users. Fine-tuning process typically involves training a pre-trained model (like ChatGPT) and adjusting the model’s parameters (such as temperature and max tokens) to enhance the model’s relevance and improve performance for a specific task, in our case responding to anxiety as a peer. We then compared the performance of the fine-tuned ChatGPT-3.5 and LLaMA7B with non-finetuned baseline models using three quantitative benchmarks: linguistic quality, safety and trustworthiness, and supportiveness, which include a broad range of different metrics. Finally, we conducted Levene’s test and Welch’s ANOVA for each metric to assess if there are statistically differences among the models.
Findings. Readability metrics, such as Flesch-Kincaid and Gunning Fog, indicated that the models generally produced text accessible to a wide audience, though some responses were more complex. Semantic coherence was measured using BLEU, ROUGE, BLEURT, and BERT and showed high consistency, though minor discrepancies were noted. Safety and trustworthiness were evaluated using the GenBit Score and Toxicity Scores, which indicated the use of harmful language, which were present in the responses. Supportiveness was assessed through empathy metrics and showed varying levels of emotions in the models.
Discussion and Implications. With the rapidly growing use of LLMs in therapeutic interventions, our study demonstrates that evaluating these models is critical for ensuring they provide a safe, supportive, and effective experience to users. Given the existing gaps in mental health services, LLMs have potential in providing immediate support in times of need, however they should not be seen as a replacement for a real therapist. As such, these tools can be used as an adjunct to therapists, however continuous refinement is necessary to personalize responses, trust in the system, and user safety. More research is needed to ensure these emerging technologies can be integrated responsibly into mental healthcare and social work practice in ways that uphold the highest standards of care, ethics, and social responsibility.
![[ Visit Client Website ]](images/banner.gif)