Main Session
Sep 29
SS 13 - DHI 1: The Digital Revolution in Radiation Oncology: AI Models for Enhanced Patient Care

180 - Multi-Institutional Validation of the SHIELD-RT Machine Learning Model to Prevent Acute Care Events during Radiotherapy

08:30am - 08:40am PT
Room 20/21

Presenter(s)

Marianna Elia, MS, BS - University of California San Francisco, San Francisco, CA

M. V. Elia1,2, R. Benson3, N. Bhargava4, J. Levey4, N. Eclov5, I. Friesner6, S. C. D. Hampson6, A. Wiztum7, M. Palta5, J. Feng8, D. Spiegel4, and J. C. Hong9; 1Bakar Computational Heath Sciences Institute, University of California San Francisco, CA, San Francisco, CA, 2UCSF + UC Berkeley Joint Program in Computational Precision Health, San Francisco, CA, San Francisco, CA, 3Department of Radiation Oncology, University of California San Francisco, San Francisco, CA, 4Department of Radiation Oncology, Beth Israel Deaconess Medical Center, Boston, MA, 5Duke University Medical Center, Department of Radiation Oncology, Durham, NC, 6University of California, San Francisco, Bakar Computational Health Sciences Institute, San Francisco, CA, 7Department of Radiation Oncology, University of California, San Francisco (UCSF), San Francisco, CA, 8UCSF, San Francisco, CA, 9University of California San Francisco, Department of Radiation Oncology, San Francisco, CA

Purpose/Objective(s): Approximately 10-20% of patients undergoing radiation therapy (RT) will require acute care in the form of emergency visits or hospitalization. We previously reported the results of the System for High-Intensity Evaluation During Radiotherapy (SHIELD-RT), one of the first machine learning (ML)-guided randomized controlled trials in healthcare, where ML was applied to electronic health record (EHR) data to identify patients at high risk for acute care events and direct increased clinical evaluations – reducing acute care by 45% and overall costs by 48%. As there remains limited external validation of prospectively tested healthcare ML models, we sought to test the hypothesis that the model would have good external validation performance at multiple institutions with distinct patient populations, clinical practices, and different EHR systems.

Materials/Methods: This IRB-approved study evaluated the SHIELD-RT model on 14,735 RT courses delivered to 12,095 patients at one institution (Site1) from January 2013 to March 2022, and 8,112 RT courses across 6,807 patients at a second institution (Site2) from January 2013 to February 2021. For each course, the previously tested gradient boosted tree model was applied to structured EHR data including patient characteristics, cancer treatment, vitals, lab results, medications, and prior acute care utilization. Model performance was evaluated using AUROC, Brier score, and calibration plots. As in SHIELD-RT, we classified high-risk RT courses as those with an acute care risk prediction of >10%.

Results: The model demonstrated good performance with an AUROC of 0.756 (95% CI: 0.737, 0.775) at Site1 and 0.770 (0.752, 0.789) at Site2 compared to 0.851 in non-intervention courses during SHIELD-RT. The sensitivity at the 10% threshold was 54.6% at Site1 and 58.0% at Site2 (69.8% during SHIELD-RT), while the specificity remained stable across cohorts at ~80%. Brier scores were <0.06 for all sites and were corroborated by calibration plots demonstrating strong calibration at both external institutions, particularly at probabilities <50%. The model classified 20% of courses as high-risk at Site1 and 23% at Site2. The true event rates of the high- and low-risk populations were 13.9% and 2.8% at Site1, 16.5% and 3.7% at Site2, demonstrating good discriminatory power.

Conclusion: In this study, we externally validated a clinically tested model to predict risk of acute care utilization among patients undergoing outpatient radiotherapy. As the SHIELD-RT study previously demonstrated the model’s ability to direct care and reduce acute care event rates, these early phase external validation results show promise for its generalizable clinical impact. We are currently evaluating the model in additional partner sites, including community and hybrid settings.