Main Session
Sep 30
PQA 09 - Hematologic Malignancies, Health Services Research, Digital Health Innovation and Informatics

3624 - Enhancing Oncology Trial Recruitment with AI: A Prospective Study of GPT-4o and Retrieval-Augmented Generation

04:00pm - 05:00pm PT
Hall F
Screen: 5
POSTER

Presenter(s)

Luiza Giuliani Schmitt, MD Headshot
Luiza Giuliani Schmitt, MD - University of Texas Southwestern Medical Center, Dallas, TX

L. Giuliani Schmitt1, K. Taing2, S. Neufeld1, K. Esselink1, H. Gonzalez1, C. Chukwuma1, E. Pina1, E. Salcedo1, J. Van Pelt1, L. Robles1, L. Apgar1, A. M. Navar1, N. B. Desai3, S. B. Jiang3, and M. Dohopolski1; 1University of Texas Southwestern Medical Center, Dallas, TX, 2School of Medicine, University of Texas Southwestern Medical Center, Dallas, TX, 3Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, TX

Purpose/Objective(s): Manual clinical trial screening remains a key barrier to efficient patient recruitment in oncology. While artificial intelligence (AI)-assisted approaches, particularly large language models (LLMs), show promise in automating eligibility assessment, prospective validation across multiple trials remains limited. We hypothesized that an AI-assisted clinical trial matching system could accurately and efficiently screen patients for clinical trial eligibility across multiple disease sites.

Materials/Methods: We prospectively implemented an AI-assisted screening pipeline across three accruing oncology trials (head & neck, breast, and prostate cancer). An AI-assisted pipeline was deployed from 11/2024-2/2025 within a health care software database to extract structured and unstructured patient data to convert into a vector database. RAG facilitated targeted retrieval, and eligibility assessments were generated using GPT-4o with Chain-of-Thought (CoT) prompting. For each trial, 5–6 key eligibility criteria were selected for AI screening. The system ranked patients based on likelihood of meeting eligibility requirements, generating a prioritized list for clinical research staff. A physician adjudicated ground-truth eligibility for all ranked patients. We conducted weekly review meetings during the first month, analyzing false positives (FP) and false negatives (FN) to improve prompt engineering and eligibility rule interpretations. Sensitivity, specificity, and F1-score were calculated for each eligibility criterion.

Results: Across the three prospective trials, the AI system evaluated 17 total eligibility criteria. Median (Q1–Q3) patients evaluated per criterion and corresponding sensitivity, specificity, and F1-score were as follows: Head & Neck Trial (5 criteria): Patients evaluated 27.5 (25.25–29.25); Sensitivity 98.2% (96.1–100.0%), Specificity 69.8% (50.0–92.1%), F1-score 95.8% (94.1–97.3%) Breast Cancer Trial (6 criteria): Patients evaluated 45.0 (39.5–47.5); Sensitivity 100.0% (89.0–100.0%), Specificity 100.0% (100.0–100.0%), F1-score 96.2% (92.2–100.0%) Prostate Cancer Trial (6 criteria): Patients evaluated 33.0 (29.0–43.0); Sensitivity 100.0% (85.7–100.0%), Specificity 100.0% (100.0–100.0%), F1-score 100.0% (88.9–100.0%). Regular review meetings evaluating outcomes to improved prompted reduced FN rates in complex eligibility criteria (e.g. increase sensitivity/specificity from 75/85% to 86/100% in 1 month).

Conclusion: Accurate AI-assisted screening has the potential to enhance clinical trial recruitment by efficiently identifying eligible patients and reducing screening burdens. The AI-generated ranked patient lists enabled targeted recruitment efforts, and the structured pipeline refinement process demonstrated the feasibility of integrating AI into clinical workflows. More work is needed to determine how AI-assisted screening impacts actual trial accrual and patient enrollment rates.