Main Session
Sep 30
PQA 09 - Hematologic Malignancies, Health Services Research, Digital Health Innovation and Informatics

3640 - Continual Whole-Body Organ Segmentation in 3D CT Scans Using Vision Transformer and Low-Rank Adaptation

04:00pm - 05:00pm PT
Hall F
Screen: 6
POSTER

Presenter(s)

Dakai Jin, PhD - Alibaba Group (US) Inc., New York, NY

Z. Ji1, V. Zhu2, D. Guo1, P. Wang3, L. Lu1, W. Zhu2, D. Jin1, and X. Ye4; 1Alibaba Group (US) Inc., Washington, DC, 2Stony Brook University, New York, NY, 3Alibaba DAMO Academy, Hangzhou, Zhejiang, China, 4Department of Radiation Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China

Purpose/Objective(s): Deep segmentation networks achieve high performance when trained on specific datasets. However, in clinical practice, patient privacy and data storage constraints often require that pretrained models are capable of adapting dynamically, allowing for the segmentation of new organs without access to the previous training datasets. This clinically preferred process can be viewed as a continual semantic segmentation (CSS) problem, which is a non-trivial task & understudied in medical imaging. Previous CSS works would either experience catastrophic forgetting or lead to unaffordable memory costs as models expand. In this study, we introduce a novel continual whole-body organ segmentation model using a lightweight low-rank adaptation mechanism (LoRA), which effectively segments 121 organs without catastrophic forgetting and meanwhile maintaining a low parameter increasing rate (PIR, %).

Materials/Methods: For the model development and validation, we train a 3D Pyramid Vision Transformer (PVT) based segmentation model on the public TotalSegmentator dataset (1204 CTs with 103 labeled organs). Subsequently, for each new learning task, we freeze the already trained network parameters and introduce additional lightweight trainable LoRA parameters to continually segment new head-neck organs (9 organs, 244 CTs) and new chest organs (9 organs, 153 CTs) over 2 in-house datasets. In each continual step, trainable LoRA parameters are incorporated into the patch-embedding, multi-head attention, and feed-forward layers. For validation, 20% of each dataset is reserved as test set. We report the Dice score (DSC, %) & 95% Hausdorff Distance (HD95, mm) as evaluation metrics.

Results: Our proposed model exhibits segmentation performance comparable to the PVT model separately developed on each dataset (achieving an average DSC of 89.8% and HD95 of 4.1mm), while only requiring about 37% of the parameters. Other CSS models either experience catastrophic forgetting (yielding very low performance, e.g., <25% DSC for [PLOP, MiB]) or have exploding PIR (32% for [SUN]). In comparison, our model avoids the knowledge forgetting (segmentation performance very similar to the upper bound) and meanwhile has the PIR of only 5.6%.

Conclusion: We develop a lightweight LoRA-based continual whole-body organ segmentation model that accurately and efficiently segments 121 organs across different body parts. This method prevents catastrophic forgetting while maintaining high segmentation performance with minimal parameter growth. Our model may reduce the time cost and inter-user variation in clinician's daily practice, where delineation or measurement of organs and lesions are required.

Table 1

Methods TotalSeg (103) HeadNeck (9) Chest (9) All (121) PIR
DSC HD95 DSC HD95 DSC HD95 DSC HD95
MiB 12.1 145.8 9.9 24.8 83.5 7.2 17.2 126.5 0
PLOP 32.7 63.8 25.5 11.1 82.8 7.1 35.8 55.6 0
SUN 91.9 3.4 84.8 2.7 84.3 5.2 90.8 3.5 32
Ours 91.1 4.1 82.9 3.1 81.5 5.8 89.8 4.1 5.6
PVT (upper bound) 91.1 4.1 84.7 2.7 83.6 5.3 90.0 4.1 100