Large Language Models (LLM)-Powered Diagnostic Co-pilot (“CapyEngine”) for Mental Disorders: Development, Evaluation, and Future Optimization
Time: 01:10 PM - 01:20 PMTopics: Mental Health, Digital Health
Despite the growing potential of large language models (LLMs) in mental health services, their application in diagnostic processes remains unexplored. This study described the development and evaluation of CapyEngine, an LLM-powered diagnostic tool designed to assist in mental disorder diagnosis.
We developed and evaluated CapyEngine through three phases. In Phase 1, we created a symptom database using the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition, Text Revision (DSM-5-TR). We then developed CapyEngine's architecture using LLMs, embedding models, and vector searches. In Phase 2, we conducted interviews and usability tests with mental health professionals (n = 7) to identify challenges in traditional diagnostic practices and potential areas for CapyEngine's application. In Phase 3, we compared CapyEngine's diagnostic accuracy against ChatGPT-4 and clinicians using 35 standardized case scenarios test questions from psychiatry and clinical psychology board exams. Questions were input into CapyEngine and the top 10 recommended diagnoses were obtained. ChatGPT-4 was prompted to provide the top ten potential diagnoses for each question. Clinicians (n = 3) received similar instruction to generate at least 10 potential diagnoses for each question. Responses were then analyzed to determine accuracy within the top 10, top 5, and top 1 diagnoses.
CapyEngine achieved 62.86% accuracy for identifying correct diagnoses within the top 10 options, and 48.57% accuracy for top diagnosis. ChatGPT-4 showed 100% accuracy within the top 10 and top 5 options, but only 31.43% for the top diagnosis. Clinicians outperformed both AI models with 82.86% accuracy within the top 10 and 57.14% for top diagnosis.
CapyEngine shows promise in augmenting the mental health diagnostic process. Future enhancements will focus on incorporating non-symptom-based diagnostic factors, developing specialized embedding models, and addressing cultural sensitivity. Further research is needed to assess the risks and benefits of integrating AI tools like CapyEngine into clinical workflows and to address barriers to adoption.
Keywords: Informatics, Mental healthWe developed and evaluated CapyEngine through three phases. In Phase 1, we created a symptom database using the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition, Text Revision (DSM-5-TR). We then developed CapyEngine's architecture using LLMs, embedding models, and vector searches. In Phase 2, we conducted interviews and usability tests with mental health professionals (n = 7) to identify challenges in traditional diagnostic practices and potential areas for CapyEngine's application. In Phase 3, we compared CapyEngine's diagnostic accuracy against ChatGPT-4 and clinicians using 35 standardized case scenarios test questions from psychiatry and clinical psychology board exams. Questions were input into CapyEngine and the top 10 recommended diagnoses were obtained. ChatGPT-4 was prompted to provide the top ten potential diagnoses for each question. Clinicians (n = 3) received similar instruction to generate at least 10 potential diagnoses for each question. Responses were then analyzed to determine accuracy within the top 10, top 5, and top 1 diagnoses.
CapyEngine achieved 62.86% accuracy for identifying correct diagnoses within the top 10 options, and 48.57% accuracy for top diagnosis. ChatGPT-4 showed 100% accuracy within the top 10 and top 5 options, but only 31.43% for the top diagnosis. Clinicians outperformed both AI models with 82.86% accuracy within the top 10 and 57.14% for top diagnosis.
CapyEngine shows promise in augmenting the mental health diagnostic process. Future enhancements will focus on incorporating non-symptom-based diagnostic factors, developing specialized embedding models, and addressing cultural sensitivity. Further research is needed to assess the risks and benefits of integrating AI tools like CapyEngine into clinical workflows and to address barriers to adoption.
Authors and Affliiates
Author: Liying Wang, Florida State UniversityCo-Author: Yunzhang Jiang, Nexcuria Labs
Large Language Models (LLM)-Powered Diagnostic Co-pilot (“CapyEngine”) for Mental Disorders: Development, Evaluation, and Future Optimization
Category
Scientific > Poster/Paper/Live Research Spotlight