Main Session
Sep 30
PQA 09 - Hematologic Malignancies, Health Services Research, Digital Health Innovation and Informatics

3737 - In What Way do We Select the Most Appropriate Machine Learning Methods in Clinical Research? A Case Study Examining the Efficacy Prediction in Locally Advanced Rectal Cancer

04:00pm - 05:00pm PT
Hall F
Screen: 20
POSTER

Presenter(s)

Lili Wang, - The First Affiliated Hospital of Soochow University, Soochow, Jiangsu

Y. Xu1, Z. Xing1, W. Gong2, L. Ji3, Z. Zhu4, Z. Jin4, J. Zhang5, X. Wang6, S. Qin3, Y. Jiao7, and L. WANG3; 1the First Affiliated Hospital of Soochow University, Suzhou, Jiangsu, China, 2Department of Radiation Oncology, The First Affiliated Hospital of Soochow University, Soochow, China, 3Department of Radiation Oncology, The First Affiliated Hospital of Soochow University, Suzhou, China, 4University of Technology Sydney, Sydney, ACT, Australia, 5Suzhou Yierqi, Suzhou, China, 6University of Malaya, Suzhou, China, 7Medical College of Soochow University, Suzhou, Jiangsu, China

Purpose/Objective(s): To establish an explainable machine learning method selection pipeline that is capable of providing explanations in clinical research.

Materials/Methods: The present study included a total of 128 patients with locally advanced rectal cancer (LARC) who received nCRT and surgery. The patients were divided into two groups based on whether they achieved pCR and whether they achieved TRG 0-1. We delineated four principal categories of clinical indicators: general indicators, blood routine indicators, tumor marker, and treatment-related indicators. A total of scenarios were identified, encompassing combinations of single or two/three categories of features. We established a pipeline which includes six stages: data selection and pre-processing, algorithm pool construction, model training, model evaluation, model explanations, and model validation. The efficacy of LARC was predicted using our pipeline, which was then verified on the LARC dataset.

Results: Based on all clinical indicators, models that predicted pCR failed the model evaluation process, the decision tree (DT) model as the most effective for predicting TRG with an accuracy of 0.68 and an F1-score of 0.63. In the presence of incomplete clinical indicators, the DT model exhibited a superior capability in forecasting both pCR and TRG outcomes. The DT models based exclusively on tumor markers can still demonstrate satisfactory performance. Inclusion of tumor markers resulted in an accuracy of 0.82 and an F1-score of 0.71 for the DT model in predicting pCR, and an accuracy of 0.76 and an F1-score of 0.72, for predicting TRG. All three DT models use CEA = 3.48 ng/ml as the root node, indicating that CEA = 3.48 ng/ml was the most important indicator for predicting efficacy. These findings are clinically logical and explainable.

Conclusion: Our explainable machine learning method selection pipeline that is capable of providing explanations in LARC clinical research. The results of extensive experiments demonstrated that the decision tree model exhibited both high accuracy and explainability in prediction. Furthermore, the findings indicated that LARC patients with low CEA and CA199 before treatment are more sensitive to nCRT. The pipeline can assist doctors in selecting appropriate ML models according to clinical needs, thereby facilitating clinical decision-making and the realization of personalized medicine.