178 - Crowdsourcing Tumor Auto-Contouring Solutions for MR-Guided Radiotherapy: The HNTS-MRG 2024 Challenge
Presenter(s)

K. A. Wahid1, C. Dede1, M. Naser1, and C. D. Fuller2; 1The University of Texas MD Anderson Cancer Center, Houston, TX, 2Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX
Purpose/Objective(s): Magnetic resonance imaging (MRI) guidance is increasingly used for head and neck cancer (HNC) radiotherapy (RT) due to its excellent soft tissue contrast, functional imaging capabilities, and potential for daily adaptive treatment. However, manual tumor contouring remains a significant challenge, as it is complex and time-intensive. Artificial intelligence (AI) approaches have shown promise in automating tumor contouring, but progress has been limited by the lack of large, publicly available datasets for MRI-guided RT. To address this gap, the Head and Neck Tumor Segmentation for MR-Guided Applications (HNTS-MRG) 2024 Challenge was launched to engage the research community in advancing AI-driven auto-contouring models for HNC.
Materials/Methods: The HNTS-MRG 2024 dataset included 202 HNC patients with pre-RT and mid-RT T2-weighted MRI scans. Tumor contours, including primary gross tumor volume (GTVp) and metastatic lymph nodes (GTVn), were provided by 13 physician annotators. Final consensus contours were generated using the Simultaneous Truth and Performance Level Estimation algorithm, based on contours from 3–4 annotators per image. The dataset was split into 150 cases for training, 2 cases for debugging, and 50 cases for testing. Participants tackled two contouring tasks: Task 1, where only pre-RT images were provided, and Task 2, where pre-RT images, pre-RT contours, and mid-RT images were available to simulate adaptive RT scenarios. Participant AI models were submitted as containerized algorithms on grand-challenge.org and evaluated using the mean aggregated Dice Similarity Coefficient (DSCagg) between GTVp and GTVn. Participant models were compared to baseline AI models developed using nnU-Net and interobserver variability (IOV).
Results: 19 teams participated in HNTS-MRG 2024, with 18 submissions for Task 1 and 15 for Task 2. In Task 1, DSCagg ranged from 0.571 to 0.825 (mean = 0.783), with the top three teams exceeding the nnU-Net baseline (0.817) and the top nine teams surpassing IOV (0.806). For Task 2, DSCagg scores ranged from 0.562 to 0.733 (mean = 0.688), with 14 of 15 teams outperforming the nnU-Net baseline (0.633) and the top four teams exceeding IOV (0.714).
Conclusion: The HNTS-MRG 2024 Challenge established a benchmark for AI-driven tumor contouring in MRI-guided adaptive RT, demonstrating that deep learning-based models can achieve expert-level performance for pre-RT contouring. However, mid-RT contouring remains a more complex task, requiring further AI advancements to effectively capture treatment-related tumor changes.