Main Session
Sep 30
PQA 09 - Hematologic Malignancies, Health Services Research, Digital Health Innovation and Informatics

3691 - Improving Clinical Trial Screening for High-Risk Bone Metastasis Using a Large Language Model and Synthetic Data

04:00pm - 05:00pm PT
Hall F
Screen: 9
POSTER

Presenter(s)

Mark Nguyen, MD, BA - University of Washington School of Medicine, Seattle, WA

M. Nguyen1, K. Fukunaga2, L. Narra1, J. Leu1, N. Cross3, E. F. Gillespie1, and J. Kang1; 1University of Washington, Department of Radiation Oncology, Seattle, WA, 2University of Washington, Seattle, WA, 3University of Washington, Department of Radiology, Seattle, WA

Purpose/Objective(s): Screening participants for clinical trials is a critical yet resource-intensive process. This study investigates the effectiveness of GPT3.5-generated synthetic data in improving large language model (LLM) models for identifying high-risk bone metastasis patients.

Materials/Methods: A dataset of 423 oncology patients with bone metastases was used to train an LLM-based classifier to identify patients with CT-imaging reports showing at least one of five high-risk clinical criteria including involvement of the (1) junctional spine, (2) long bone, (3) hip/shoulder joint, (4) sacroiliac joint and (5) bulky disease. We trained BlueBERT, a biomedical language model, to recognize these five criteria under two training pipelines: a baseline pipeline, trained exclusively on real patient data, and a synthetic pipeline, incorporating GPT3.5-generated synthetic oncology notes. Synthetic data was generated by prompting GPT3.5 to create CT or PET descriptions of bone metastases without exposing GPT3.5 to real patient data. Model performance was assessed using the area under the receiver operating characteristic curve (AUC), sensitivity, positive predictive value (PPV), and F1 score, which is a harmonic mean of sensitivity and PPV.

Results: The inclusion of synthetic data significantly improved model performance. The baseline BlueBERT model achieved an AUC of 0.876, while models trained with 10k, 20k, and 40k synthetic samples achieved AUCs of 0.907, 0.913, and 0.905, respectively. Sensitivity improved from 0.759 (baseline) to a maximum of 0.876 (Synthetic-10k), while PPV increased from 0.634 to 0.685 (Synthetic-20k). F1 score also improved from 0.676 to a maximum of 0.751 (Synthetic-20k) with the inclusion of a synthetic dataset.

Conclusion: This study demonstrates that GPT3.5-generated synthetic data improves the classifier performance of the trained LLM models in identifying high-risk bone metastasis patients. We plan to use synthetic data to augment real-time screening to help detect high-risk bone metastases for clinical trial enrollment.

Abstract 3691 - Table 1

Model name

AUC

Sensitivity

PPV

F1

Baseline

0.876

0.759

0.634

0.676

Synthetic-10k

0.907

0.876

0.663

0.748

Synthetic-20k

0.913

0.841

0.685

0.751

Synthetic-40k

0.905

0.863

0.663

0.743