Main Session
Sep 29
SS 13 - DHI 1: The Digital Revolution in Radiation Oncology: AI Models for Enhanced Patient Care

179 - Automated Clinical Target Volume Contour Quality Assurance for the TROG 08.08 TOPGEAR Trial

08:20am - 08:30am PT
Room 20/21

Presenter(s)

Phillip Chlap, MS Headshot
Phillip Chlap, MS - Radformation, New York, NY

P. Chlap1,2, M. T. Lee2,3, T. Leong4, M. Field1,2, J. Dowling5, H. Min3,5, J. Chu4,6, J. Tan4, P. K. Tran4, T. Kron6,7, A. Haworth8, L. E. Court9, M. A. Ebert10,11, S. Vinod1,2, and L. Holloway1,2; 1UNSW and Ingham Institute for Applied Medical Research, Sydney, Australia, 2Liverpool and Macarthur Cancer Therapy Centre, Sydney, Australia, 3Faculty of Medicine, South Western Sydney Clinical School, UNSW, Sydney, Australia, 4Peter MacCallum Cancer Centre, Melbourne, Australia, 5Australian e-Health Research Centre, CSIRO, Brisbane, Australia, 6Sir Peter MacCallum Department of Oncology, the University of Melbourne, Melbourne, Australia, 7Peter MacCallum Cancer Centre, Melbourne, VIC, Australia, 8School of Physics, University of Sydney, Sydney, Australia, 9Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, 10School of Physics and Astrophysics, University of Western Australia, Perth, Australia, 11Radiation Oncology, Sir Charles Gairdner Hospital, Perth, Australia

Purpose/Objective(s): Quality Assurance (QA) is crucial in radiotherapy (RT) clinical trials to ensure protocol adherence, especially for target volumes, as violations can impact patient outcomes. However, manual QA is resource-intensive, limiting reviews to a subset of patients. The objective was to test an automated contour QA approach that flags violations for manual review using the TROG 08.08 TOPGEAR trial dataset, which evaluated preoperative chemoradiotherapy alongside perioperative chemotherapy for resectable gastric cancer. TOPGEAR’s Clinical Target Volume (CTV) is anatomically defined and complex to contour. Designed for prospective trials with limited initial data, our approach used a small training set.

Materials/Methods: To evaluate our approach, 93 cases from the TOPGEAR dataset were selected: 10 for training, 33 for validation and parameter tuning, and 50 for holdout testing. Five radiation oncologists contoured the CTV on the training set with a consensus workshop held to ensure protocol adherence. The clinical CTV definitions, including both passing and violating cases, were used for validation and testing (Table 1).

A 3D nnUNet was trained on the STAPLE-combined volume of 5 observers. To improve CTV segmentation accuracy, an anatomical label map generated using TotalSegmentator, including the duodenum, pancreas, and stomach, was added as an input channel. A probabilistic UNet model was then trained to capture inter-observer variability, predicting the acceptable CTV range with an uncertainty band using the image and nnUNet segmentation as inputs.

The models were applied to the validation and test sets. Two metrics, contour fit and distance-to-band, were evaluated on the validation set to compare the clinical CTV to the uncertainty band, detecting under-contouring, over-contouring, and both combined. The metric with the highest AUC-ROC was selected, with a threshold set to detect at least 90% of violations and was then applied to the test set for final evaluation.

Results: The best-performing metric for detecting CTV violations in the validation set was the distance-to-band for under-contouring, with an AUC of 0.84. A threshold set for a true positive rate (TPR) of 0.9 (22/24) resulted in a false positive rate (FPR) of 0.44 (14/32). When applied to the test set, an AUC of 0.88 was achieved, with a TPR of 0.91 (31/34) and an FPR of 0.39 (19/49).

Conclusion: Our automated contour QA approach for TOPGEAR showed potential to identify over 90% of violating CTVs and reduce the need for manual QA with over 50% passing CTVs detected.

Abstract 179 - Table 1: Dataset breakdown with results of manual and automated QA

Training

Validation

Testing

Cases

10

33

50

Total CTVs

50

56

83

Manual Trial QA

- Pass

50

32

49

- Violation

-

24

34

Automated QA

- Correct Pass (TN)

-

16

30

- Correct Violation (TP)

-

22

31

- Missed Pass (FP)

-

14

19

- Missed Violation (FN)

-

2

3

- Accuracy

-

0.7

0.73

- Sensitivity

-

0.92

0.91

- Specificity

-

0.53

0.61

- F-Score

-

0.73

0.74