Main Session

Sep 30

PQA 09 - Hematologic Malignancies, Health Services Research, Digital Health Innovation and Informatics

3726 - When Big Data Plateaus: Investigating the Limits of Data and Model Scaling for Cervical Cancer Segmentation

04:00pm - 05:00pm PT

Hall F

Screen: 15

POSTER

Presenter(s)

Haitao Sun, - Peking University Third Hospital, Beijing,

H. Sun, W. Huang, H. Xiao, X. Deng, A. Qu, J. Wang, and P. Jiang; Department of Radiation Oncology, Peking University Third Hospital, Beijing, China

Purpose/Objective(s): Precise segmentation of cervical cancer remains challenging due to inherent image variability and often subtle tumor features. The prevailing “big data + big model” hypothesis suggests that continuously increasing training dataset size and model complexity leads to steady improvements. This study assesses whether scaling up data and model size under the "big data + big model" hypothesis leads to meaningful performance improvements, or if alternative approaches are needed.

Materials/Methods: We tested this hypothesis using a dataset of 500 cervical cancer patients (50/20 training/test split, expert annotations with TiGRT MC TPS v2.0). Model scaling was assessed by evaluating 5 deep learning models, defined as "big" or "small" based on model parameters & pre-training data scale: (BM1) SAM (largest, general pre-training), (BM2) Swin-Unet (50K medical pre-training), (BM3) Swin-Unet (5K medical pre-training), (SM1) 3D Unet (1K medical pre-training), (SM2) 3D Unet (smallest, no pre-training). Dataset scaling was further evaluated for the top models (BM2 and BM3), by training them with increasing dataset sizes (50 to 500 patients). Segmentation performance was assessed using 3D Dice scores.

Results: At the largest dataset size (500 patients), Dice scores were: BM1 0.60 ± 0.05, BM2 0.75 ± 0.03, BM3 0.80 ± 0.02, SM1 0.65 ± 0.04, SM2 0.70 ± 0.04. Regarding model scaling, surprisingly, within big models, BM3, the least complex model, achieved the highest Dice, outperforming BM2 and BM1. Similarly, among smaller models, SM2, the less complex model, outperformed SM1, the more complex model. For cross-group comparison, while the general trend of big models outperforming smaller models holds true in some cases (e.g., BM2 > SM1), we observed exceptions: SM2, a smaller model, unexpectedly outperformed BM1, a big model. Regarding dataset scaling, BM3 consistently yielded higher Dice than BM2 across dataset sizes. However, for BM3, peak performance (0.81 ± 0.02) was at 200 patients, with plateauing and slight decrease to 0.80 ± 0.02 at 500 patients. Similarly, BM2 showed plateauing around 0.75 ± 0.03 beyond 300 patients. Segmentation performance for different models summarized in table 1.

Conclusion: This study challenges the simplistic "big data + big model" hypothesis for cervical cancer segmentation. While big models can offer advantages, especially with targeted pre-training, general outperformance of big models was not observed. Instead, optimal performance was achieved by a moderately sized, medically pre-trained model, and simply scaling dataset size provided limited benefit. Alternative approaches, such as integrating multi-modal or clinical data and reducing data heterogeneity, warranting further investigation.

Abstract 3726 - Table 1

Model	Description	Dice
BM1	Largest, general pre-training	0.60 ± 0.05
BM2	Big, 50K medical pre-training	0.75 ± 0.03
BM3	Big, 5K medical pre-training	0.80 ± 0.02
SM1	Small, 1K medical pre-training	0.65 ± 0.04
SM2	Smallest, no pre-training	0.70 ± 0.04