Main Session
Sep 30
PQA 09 - Hematologic Malignancies, Health Services Research, Digital Health Innovation and Informatics

3719 - Large Language Models to Organize and Represent Research Trends from American Society for Radiation Oncology (ASTRO) Annual Conferences from 2019-2023

04:00pm - 05:00pm PT
Hall F
Screen: 2
POSTER

Presenter(s)

Camille Sawa, - University of Washington, Seattle, WA

C. Sawa1, S. Konyakin2, S. Castillo3, R. Shymansky4, K. Zhang1, H. Zhang1, P. Alaparthi1, and J. Kang5; 1University of Washington Paul G. Allen School of Computer Science & Engineering, Seattle, WA, 2University of Washington College of Engineering, Seattle, WA, 3University of Washington Information School, Seattle, WA, 4University of Central Florida College of Medicine, Orlando, FL, 5University of Washington School of Medicine, Fred Hutch Cancer Center, Department of Radiation Oncology, Seattle, WA

Purpose/Objective(s): Every year, approximately 2000 abstracts are presented at the ASTRO Annual Meeting in various formats. Conference organizers seek to balance several competing factors to make an engaging conference. Having a better understanding of recently presented abstracts can benefit organizers, presenters, and attendees. We hypothesize that large language models can both organize and name research themes presented at ASTRO.

Materials/Methods: We analyzed a dataset of 9,770 abstracts presented at the ASTRO Annual Meeting from 2019 to 2023. Using the BERTopic Python package, we converted abstracts into PubMedBERT embeddings, then clustered the embeddings into 100 topics using HDBScan. To generate topic names, we input representative documents into a GPT-3.5 model, applying role prompting and directive commanding strategies across three reinforcement phases of prompt tuning. Manual validation of GPT-generated names compared to ASTRO-generated names was performed through surveys of quantitative agreement and comments. Abstracts and topics were interactively visualized in 2D using the Altair Python package.

Results: The largest, middle, and smallest topics along with GPT-generated names are presented in Table 1. Visualization uncovered meta-topics such as Education and Translational Science which were in close proximity to each other and on opposite ends of 2D axes. GPT-generated names obtained using 20 representative documents and three prompt tuning stages were preferred in 77.5% of manual validation responses, while categories generated by ASTRO organizers were preferred in 22.5% of the time.

Conclusion: Combining representative BERT models and generative GPT models allowed us to extract and represent topics from nearly 10,000 ASTRO abstracts that were more preferred than human-generated categories. This work could assist conference organizers in the formidable task of planning large meetings.

Abstract 3719 - Table 1

Largest 5 Topics(no. abstracts)

Middle 5 Topics(no. abstracts)

Smallest 5 Topics(no. abstracts)

Management Strategies in Thoracic Malignancies (609)

Management of Palliative RT for Adv. Cancer (58)

Liver Cancer Treatment Optimization Using Adv. Imaging and Prediction Models (19)

Head and Neck Cancer RT Outcomes (555)

Treatment Outcomes and Prognostic Factors in Cutaneous Malignancies (53)

Treatment Outcomes of Ocular Tumors with Brachytherapy (18)

RT for Prostate Cancer: Efficacy and Quality of Life (546)

DIBH Techniques in Radiation Oncology (45)

Radiomics and Imaging Biomarkers in Pancreatic Cancer (18)

Adaptive Radiotherapy for Cancer Treatment (404)

RT Optimization in Cancer Treatment (44)

Radiation Oncology Training and Education (18)

Radiation Therapy for Brain Metastases (348)

Treatment Patterns and Outcomes in Elderly Patients with Adv. Cancer (43)

Enhancing Treatment Efficacy in Brain Tumors (15)