Main Session

Sep 30

PQA 09 - Hematologic Malignancies, Health Services Research, Digital Health Innovation and Informatics

3668 - Application of Large Language Models for Condensation of Oncopathological Reports for Clinical Use

04:00pm - 05:00pm PT

Hall F

Screen: 8

POSTER

Presenter(s)

Yirong Liu, MD, PhD - Northwestern Memorial Hospital, Chicago, IL

Y. Liu¹, J. John¹, S. Sarkar¹, A. Zakkar¹, P. D. Kinkopf², P. T. Teo¹, and M. Abazeed¹; ¹Department of Radiation Oncology, Northwestern University, Feinberg School of Medicine, Chicago, IL, ²Northwestern University Feinberg School of Medicine, Chicago, IL

Purpose/Objective(s): Reviewing multiple pathology reports is a complex and time-intensive task, requiring physicians to synthesize disparate findings from several reports across various institutions and testing modalities. The process involves interpreting histopathological, immunohistochemical, and molecular data, often under time constraints that heighten the risk of errors and cognitive fatigue. Large language models (LLMs) offer a promising solution by generating concise, coherent summaries from complex data streams. We sought to explore the feasibility and limitations of LLMs in summarizing oncopathological reports.

Materials/Methods: Patients who underwent initial consultation with a single thoracic radiation oncologist in the Department of Radiation Oncology at Northwestern University between January 2019 and July 2023 were included in this study. Original pathology reports and pathology summaries from consultation notes were extracted from the electronic medical record system and subsequently anonymized for analysis. LLM-generated summaries were produced from the original pathology reports, which were further evaluated using objective metrics, including BLEU, ROUGE, METEOR, and a modified BERTscore, along with subjective metrics, including correctness, completeness, and potential harm, with the original pathology reports serving as the ground truth. Additionally, the LLM summaries were compared with pathology summaries from consultation notes for correctness and completeness. The Wilcoxon signed-rank test was utilized for paired analyses, with adjustments for multiple comparisons performed using Bonferroni’s correction.

Results: A total of 94 cases were included in this study. Six open-source LLMs (Llama 3.0, Llama 3.1, Llama 3.2, Mistral, Gemma, and DeepSeek) were utilized to generate summaries of pathology reports. Using the original pathology reports as the ground truth, the LLM-generated summaries exhibited higher scores across all models and objective evaluation metrics, compared to summaries derived from consultation notes (p < 0.001). In the subjective evaluation, DeepSeek, Mistral, Llama 3.1, and Llama 3.2 achieved higher ratings for completeness (p = 0.003, p < 0.001, p < 0.001, and p < 0.001, respectively) while maintaining comparable correctness scores relative to consultation note summaries (p = 1.000, p = 0.088, p = 0.064, and p = 0.088, respectively). The results remained consistent in additional subjective analyses involving multiple evaluators for Llama 3.1.

Conclusion: LLM-generated summaries demonstrated better performance in objective metrics and greater completeness in subjective evaluations compared to consultation note summaries. These results highlight the potential of LLMs as valuable tools for enhancing clinical documentation and workflow efficiency in oncology practice.