3252 - Large Language Models Applied to Patient Messages for Predicting Acute Care Risk during Radiation Therapy
Presenter(s)
B. Hyams1, J. H. Chang2, M. V. Elia3, R. Benson4, A. Ashraf-Ganjouei5, and J. C. Hong6; 1UCSF School of Medicine, San Francisco, CA, 2Department of Radiation Oncology, Seoul National University College of Medicine, Seoul, Korea, Republic of (South), 3Bakar Computational Heath Sciences Institute, University of California San Francisco, CA, San Francisco, CA, 4Department of Radiation Oncology, University of California San Francisco, San Francisco, CA, 5UCSF, San Francisco, CA, 6University of California San Francisco, Department of Radiation Oncology, San Francisco, CA
Purpose/Objective(s):
Acute care events, including ER visits and unplanned hospitalizations, are a common complication for patients undergoing radiation therapy (RT), resulting in increased healthcare costs and delays. Patients frequently communicate symptoms and acute needs via patient portal messages, which may contain information portending future acute care requirements. At our institution, messages are frequently triaged by nursing for appropriate clinical management. Large Language Models (LLMs) provide an opportunity incorporate these data into predictive models. We evaluated an open-source LLM without specific pre-training to predict acute care events based on unstructured patient messages.Materials/Methods:
We evaluated RT courses at an outpatient radiation oncology center between October 2012 and February 2022. Messages sent via patient portal between 30 days and 1 day prior to prediction were aggregated and appended with a task-specific prompt. Text data were processed using the LLAMA-3.0-8b base model. We extracted features by retrieving the vector of the final token from the LLM's last hidden layer, representing the text within a 4096-dimensional feature space. For risk prediction, we trained a weighted logistic regression model with LLM-derived features as inputs and acute care incidents (ER visits and/or unplanned hospitalizations) as outcomes. An 80-20 train-test split was used for performance evaluation. Since LLM-derived features lack direct textual correlates, we averaged attention masks across layers to generate attention "importance" scores for tokens. These scores may indicate which tokens influence output features most.Results:
Our study included 4320 RT courses from unique patients. 355 (8%) of the 30-day intervals assessed preceded an acute care event. 81% of events involved ER visits and 66% involved unplanned hospitalizations. English was preferred by 93% of patients, with <2% preferring Mandarin, Cantonese, Spanish and Russian. Our pipeline predicted acute care events with an AUROC of 0.65. Qualitative comparison of attention patterns between 10 true positive and 10 false negative predictions revealed a relative trend towards increased importance assigned to words relating to symptoms (“nausea”) and acuity (“urgent”) in positive predictions over negative ones.Conclusion:
We show that patient messages offer modest predictive utility for acute care risk triage in RT. Our pipeline using an out-of-the-box LLM extracted relevant features from unstructured text data while avoiding the large computational demands of LLM pre-training. While patient messages alone may not provide adequate predictive power for clinical use, combining them with other data sources may improve performance. One limitation is the limited explainability of predictive features. Future approaches may explore LLM prompting to extract interpretable features.