3757 - A Locally Deployed Large Language Model Framework for Automated Triage of Radiation Oncology Consult Notes: A Pilot Study on Palliative Hemostasis
Presenter(s)
C. Zhang1, L. R. Narra1, M. Patel1, I. Jan1, R. Tang2, S. K. Jabbour1, M. P. Deek1, and K. Nie3; 1Department of Radiation Oncology, Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, 2Rutgers Universtiy, New Brunswick, NJ, 3Rutgers Cancer Institute of New Jersey, New Brunswick, NJ
Purpose/Objective(s): Retrospectively integrating patient consult notes with treatment plans in radiation oncology is laborious and time-consuming, often requiring manual searches through large databases. Advances in Natural Language Processing (NLP), particularly the use of large language models (LLMs), offer an efficient way to interpret patient consult notes and triage cases based on specific clinical requirements. This study proposes a data mining framework that automatically retrieves patient documents from the clinical database—conditioned on relevant treatment parameters—and applies a locally deployed LLM to classify the consult notes for a specific clinical task. As an illustrative example, our goal was to identify patients who received palliative radiotherapy specifically for hemostasis.
Materials/Methods: We first queried oncology electronic health records (EHR) database to identify patients with palliative intent over the past five years. We then retrieved their corresponding consult notes. To prepare the data, we extracted 300-character windowed sentences around keywords (e.g., “bleed”, “palliat”) and concatenated them into a “context” string for each patient note. A simple negation filter scanned for terms within a ±3-word window to remove obvious false positives. Next, a locally and securely deployed LLM (e.g., an open-source 32B-4bit chat model) performed zero-shot binary classification on the “context,” guided by a universal prompt instructing the model to output either “yes” or “no” plus an explanatory statement. The reference standard for classification was determined by a radiation oncologist who reviewed each consult note.
Results: From an initial pool of 3021 patient plans among 1932 unique patients treated with palliative intent, 209 initial consult notes remained after key word search and 193 with negation filtering, among which 109 are false positives. Employing a pretrained chat model (7B-4Bit) achieved a sensitivity 0.98 (95% CI: 0.95, 1) but specificity suffered only 0.27 (95% CI: 0.18, 0.37). To further improve the performance, using a high-specificity prompt, a trained 32B-4Bit chat model achieved a sensitivity of 0.95 (95% CI: 0.91, 0.99) and a specificity of 0.89 (95% CI: 0.82, 0.95).
Conclusion: By leveraging an automated data mining pipeline and a locally deployed LLM, this framework demonstrates a hardware-friendly approach to efficiently and accurately classify consult notes for a common clinical task such as the showcase example. Future work will extend the methodology to broader clinical indications and further refine model prompts such as using few-shot learning for improved performance.