3762 - Assessing the Risks of Synthetic Data in Deep Learning-Based PET Tumor Segmentation
Presenter(s)
Y. Zheng1, T. Zeng2, H. Huang3, C. Zhang1, T. Li4, J. Lu5, C. Chu6, M. Liu1, C. Wang7, K. Huang1, F. F. Yin1, and Z. Yang8; 1Duke Kunshan University, Kunshan, Jiangsu, China, 2The Hong Kong Polytechnic University, Hong Kong, Hong Kong, China, 3North China University of Technology, Beijing, Beijing, China, 4Duke Kunshan University, Kunshan, Kunshan, China, 5Australian National University, Canberra, ACT, Australia, 6Nanyang Technological University, Singapore, Singapore, Singapore, 7Duke University, Durham, NC, 8Duke Kunshan University, Kunshan, NC, China
Purpose/Objective(s): PET is clinical gold standard for malignant tumor identification due to its functional imaging capabilities. Deep learning-based auto-segmentation remains constrained by the limited availability of PET data. A proposed solution involves generating synthetic PET images from widely accessible CT scans using generative models. However, incorporating synthetic data into model training raises concerns about “data poisoning,” where artificially generated images may fail to accurately capture the tumor-related features, potentially compromising model reliability. This study systematically investigates the impact of synthetic data incorporation on nnU-Net-based whole-body tumor segmentation performance using FDG-PET.
Materials/Methods: A 3D nnU-Net was trained for whole-body tumor segmentation on 180 FDG-PET cases. Synthetic PET images were generated from paired CT scans using a GAN with a shared encoder-decoder architecture and shortest-path regularization to enhance anatomical fidelity. The training dataset was progressively poisoned by replacing real PET images with synthetic PET at varying proportions (ranging from 0%-100%). 2 models were evaluated: (1) a baseline nnU-Net trained on real PET data and (2) set of poisoned models trained with mixed real/synthetic data. Model performance was assessed on an independent 30 real PET cases using Dice, Jaccard, sensitivity, and specificity.
Results: The baseline nnU-Net achieved a Dice=0.69±0.21, consistent with reported whole body PET segmentation performance. The GAN successfully generated synthetic PET images with high visual similarity to real PET scans. As the proportion of synthetic data increased, segmentation performance declined significantly, with Dice dropping to 0.27±0.17. Similar trends were observed in Jaccard (from 0.56±0.22 to 0.17±0.12) and sensitivity (from 0.74±0.24 to 0.38±0.25). Specificity remained constantly high (>0.99), indicating that synthetic data poisoning primarily impaired tumor volume identification rather than non-tumor region.
Conclusion: This study systematically quantifies the impact of synthetic data contamination on PET-based tumor segmentation. Findings suggest that synthetic PET images may fail to preserve critical tumor-specific radiotracer distributions, leading to a substantial decline in segmentation accuracy. To our knowledge, this is the 1st comprehensive evaluation of the risks associated with synthetic data incorporation in medical image segmentation. As generative AI models gain traction in clinical applications, rigorous data selection remains essential to ensure reliable and clinically applicable model.
Table 1Poisoning Rate | Dice | Jaccard | Sensitivity | Specificity | ||
0% | 0.69±0.21 | 0.56±0.22 | 0.74±0.24 | 0.99±0.01 | ||
20% | 0.65±0.21 | 0.51±0.22 | 0.70±0.22 | 0.99±0.01 | ||
40% | 0.62±0.22 | 0.48±0.22 | 0.64±0.27 | 0.99±0.01 | ||
60% | 0.50±0.25 | 0.38±0.23 | 0.58±0.29 | 0.99±0.01 | ||
80% | 0.42±0.24 | 0.29±0.20 | 0.45±0.30 | 1.00±0.00 | ||
100% | 0.27±0.17 | 0.17±0.12 | 0.38±0.25 | 0.98±0.01 |