Main Session
Sep 30
PQA 09 - Hematologic Malignancies, Health Services Research, Digital Health Innovation and Informatics

3718 - Explainable Federated Learning for Plan Quality Assurance (QA) Prediction across Anatomical Sites

04:00pm - 05:00pm PT
Hall F
Screen: 11
POSTER

Presenter(s)

Sagnik Sarkar, MS - Northwestern University Feinberg School of Medicine, Chicago, IL

S. Sarkar, M. Abazeed, and P. T. T. Teo; Department of Radiation Oncology, Northwestern University, Feinberg School of Medicine, Chicago, IL

Purpose/Objective(s): Quality assurance (QA) of radiation treatment plans requires extensive testing and simulations, making the process time-consuming and resource-intensive. This study aims to develop a Federated Learning (FL) framework1 to predict treatment plan QA passing rates based on treatment plan parameters2. By leveraging FL across multiple anatomical sites, we enable knowledge sharing while preserving site-specific characteristics, enhancing prediction capability despite inherent data limitations. Additionally, we integrate Explainable AI (XAI) techniques to interpret feature importance, providing insights into the key parameters influencing plan QA outcomes.

Materials/Methods: The dataset includes 114, 69, 112, and 150 plan parameters for Brain, Lung, H&N, and Pelvis QA plans, respectively. An average of 24 derived features per QA plan were analyzed, including beam-on time, meanMLCgap, MLCSpeedMod, GantrySpeedMod, and PulseMod, etc. Each site acts as a distinct client in the FL framework, preserving local learning while contributing to a global model through iterative aggregation. The FL setup followed a 3-client, 1-server strategy, with training on three anatomic sites and the fourth reserved for testing using the global model as a cold start. Training consisted of 25 rounds, 2 local epochs per round, and a batch size of 16. Mean Squared Error (MSE) was the loss function, while Mean Absolute Error (MAE) and Binary Cross-Entropy (BCE) served as evaluation metrics. MSE and MAE estimated the Gamma Passing Rate (GPR%) under the 2%/2mm criterion, while BCE classified plan QA success (pass/fail). To enhance interpretability, Integrated Gradients (IG) was applied post-training to quantify each feature’s importance in predicting QA success, offering insights into site-specific contributions and key predictors of QA passing.

Results: The FL model demonstrated consistent performance across all sites, with training MSE between 0.0041 and 0.0045 and test MSE ranging from 0.0071 to 0.0121. Test BCE values ranged from 0.2766 (Brain) to 0.3395 (H&N), indicating high prediction accuracy. A higher loss observed during testing indicates that H&N plans are more dependent on site specific features. IG analysis identified site-specific variations in features influencing plan QA passing, with beam-on time and meanTGi being key in Head & Neck, GantrySpeedM significant for Pelvis and Lung, and machine-related parameters like MLCSpeedModulation and MachineType playing a crucial role, underscoring the impact of beam delivery on QA outcomes.

Conclusion: FL improved the prediction of treatment plan QA passing rates, addressing challenges associated with limited data availability while preserving site-specific learning.

Table 1