Title
题目
Segment Like A Doctor: Learning reliable clinical thinking and experience for pancreas and pancreatic cancer segmentation
像医生一样分割:学习用于胰腺及胰腺癌分割的可靠临床思维与经验
01
文献速递介绍
胰腺癌是最致命的恶性肿瘤之一,在全球癌症相关死亡原因中排名第七(Rawla 等,2019)。其特点是诊断延迟、治疗困难且死亡率高(Kamisawa 等,2016),患者总体五年生存率不足8%(Chhoda 等,2019)。胰腺癌在病程进展中常表现出不断增强的侵袭潜力,直接影响肿瘤可切除性评估等临床决策。荷兰胰腺癌小组(DPCG)根据肿瘤与血管的接触程度对胰腺癌的可切除性进行分类(Versteijne 等,2016)。图1展示了对比增强计算机断层扫描(CT)图像中不同可切除性胰腺癌的可视化对比。因此,胰腺癌的准确可靠分割在诊断和治疗过程中至关重要。 近年来,某些基于深度学习的方法已尝试应用于胰腺癌分割。Zhao 等(2021)提出了一种整体分割-网格分类网络,利用几何和位置信息进行胰腺肿块分割;Chen 等(2021)提出了一种基于螺旋变换的模型驱动深度学习方法用于胰腺癌分割;Li 等(2023b)提出了一种包含三个温度引导模块的3D全卷积神经网络,实现胰腺和肿瘤的联合分割;Qu 等(2023)提出了一种Transformer引导的渐进融合网络,利用全局表示进行3D胰腺和胰腺肿块分割。总体而言,上述方法虽做出了多种尝试,但存在两方面局限:其一,由于胰腺癌体积微小、形状不规则且边界极其不确定,现有工作难以处理真实临床数据中的复杂病例(Wang 等,2021;Cao 等,2023);其二,几乎所有先前研究均基于黑箱模型,仅学习标注分布,缺乏可信度和可解释性(Yao 等,2023;Zhao 等,2024)。 因此,本研究提出一种新颖的像医生一样分割(SLAD) 框架,旨在学习CT图像中胰腺及胰腺癌分割的可靠临床思维与经验。图2展示了该方法的核心思想:SLAD旨在模拟医生在胰腺癌渐进式诊断阶段(器官、病灶、边界阶段)的核心逻辑思维,每个阶段均包含丰富且可信的肿瘤分析医学经验。具体而言,高级医学专家在分割胰腺癌时,通常会先对CT图像中的腹部器官进行整体评估,以获取解剖分布的整体认知,从而初步定位胰腺(Zhou 等,2017);随后,在胰腺体积内通过寻找基于强度的异质性和基于形状的异常等本质特征差异,对胰腺癌核心区域进行全局判断(Li 等,2023a);最后,医生通常会基于可信的病灶核心,通过探索肿瘤-血管侵犯或神经周围侵犯等额外临床相关影像信息,校准肿瘤边界(Mahmoudi 等,2022)。因此,我们通过人工神经网络学习上述可靠临床思维与经验,构建SLAD框架,以实现更准确、可信的胰腺癌分割。 本研究的主要贡献如下: - 提出新颖的SLAD框架,学习渐进式诊断阶段的可靠临床思维与经验,实现更准确可信的胰腺癌分割; - 引入解剖感知掩码自动编码器(AMAE),通过自监督预训练建模医生对CT图像中腹部器官解剖分布的整体认知; - 设计因果驱动图推理模块(CGRM),通过探索因果病灶与非因果器官之间的拓扑特征差异,模拟医生对病灶检测的全局判断; - 开发基于扩散的差异校准模块(DDCM),基于可信的病灶核心推断模糊的分割差异,拟合医生对胰腺癌不确定边界的精细化理解; - 在三个独立数据集上的实验结果表明,与最先进方法相比,该方法将胰腺癌分割准确率提升4%-9%,并通过肿瘤-血管侵犯分析验证了其在临床应用中的优越性。
Abatract
摘要
Pancreatic cancer is a lethal invasive tumor with one of the worst prognosis. Accurate and reliable segmentationfor pancreas and pancreatic cancer on computerized tomography (CT) images is vital in clinical diagnosisand treatment. Although certain deep learning-based techniques have been tentatively applied to this task,current performance of pancreatic cancer segmentation is far from meeting the clinical needs due to the tinysize, irregular shape and extremely uncertain boundary of the cancer. Besides, most of the existing studies areestablished on the black-box models which only learn the annotation distribution instead of the logical thinkingand diagnostic experience of high-level medical experts, the latter is more credible and interpretable. Toalleviate the above issues, we propose a novel Segment-Like-A-Doctor (SLAD) framework to learn the reliableclinical thinking and experience for pancreas and pancreatic cancer segmentation on CT images. Specifically,SLAD aims to simulate the essential logical thinking and experience of doctors in the progressive diagnosticstages of pancreatic cancer: organ, lesion and boundary stage. Firstly, in the organ stage, an Anatomyaware Masked AutoEncoder (AMAE) is introduced to model the doctors’ overall cognition for the anatomicaldistribution of abdominal organs on CT images by self-supervised pretraining. Secondly, in the lesion stage,a Causality-driven Graph Reasoning Module (CGRM) is designed to learn the global judgment of doctors forlesion detection by exploring topological feature difference between the causal lesion and the non-causal organ.Finally, in the boundary stage, a Diffusion-based Discrepancy Calibration Module (DDCM) is developed to fitthe refined understanding of doctors for uncertain boundary of pancreatic cancer by inferring the ambiguoussegmentation discrepancy based on the trustworthy lesion core. Experimental results on three independentdatasets demonstrate that our approach boosts pancreatic cancer segmentation accuracy by 4%–9% comparedwith the state-of-the-art methods. Additionally, the tumor-vascular involvement analysis is also conducted toverify the superiority of our method in clinical applications.
胰腺癌是一种致命的侵袭性肿瘤,预后极差。在计算机断层扫描(CT)图像上对胰腺及胰腺癌进行准确可靠的分割,对临床诊断和治疗至关重要。尽管某些基于深度学习的技术已尝试应用于该任务,但由于癌灶体积微小、形状不规则且边界极其不确定,当前胰腺癌分割的性能远未满足临床需求。此外,现有研究大多基于黑箱模型,这些模型仅学习标注分布,而非高级医学专家的逻辑思维和诊断经验——后者更具可信度和可解释性。 为缓解上述问题,我们提出了一种新颖的像医生一样分割(SLAD) 框架,旨在学习用于CT图像中胰腺及胰腺癌分割的可靠临床思维与经验。具体而言,SLAD旨在模拟医生在胰腺癌渐进式诊断阶段(器官阶段、病灶阶段和边界阶段)的核心逻辑思维与经验: 1. 器官阶段:引入解剖感知掩码自动编码器(AMAE),通过自监督预训练建模医生对CT图像中腹部器官解剖分布的整体认知; 2. 病灶阶段:设计因果驱动图推理模块(CGRM),通过探索因果病灶与非因果器官之间的拓扑特征差异,学习医生对病灶检测的全局判断; 3. 边界阶段:开发基于扩散的差异校准模块(DDCM),基于可信的病灶核心推断模糊的分割差异,以拟合医生对胰腺癌不确定边界的精细化理解。 在三个独立数据集上的实验结果表明,与最先进的方法相比,我们的方法将胰腺癌分割准确率提升了4%-9%。此外,研究还进行了肿瘤-血管侵犯分析,以验证该方法在临床应用中的优越性。
Method
方法
As shown in Fig. 3, we propose a novel Segment-Like-A-Doctor(SLAD) framework to learn the reliable clinical thinking and experiencefor pancreas and pancreatic cancer segmentation on CT images. OurSLAD aims to simulate the essential logical thinking and experience
如图3所示,我们提出了一种新颖的**类医生分割(SLAD) 框架,用于学习CT图像中胰腺及胰腺癌分割的可靠临床思维与经验。我们的SLAD旨在模拟医生在胰腺癌渐进式诊断阶段的核心逻辑思维与经验。
Conclusion
结论
In this paper, we propose a novel Segment-Like-A-Doctor (SLAD)framework to learn the reliable clinical thinking and experience forpancreas and pancreatic cancer segmentation on CT images. Specifically, our proposed method aims to simulate the essential logicalthinking and experience of doctors in the progressive diagnostic stagesof pancreatic cancer: organ, lesion and boundary stage. Firstly, inthe organ stage, an Anatomy-aware Masked AutoEncoder (AMAE) isintroduced to model the doctors’ overall cognition for the anatomical distribution of abdominal organs on CT images by self-supervisedpretraining. Secondly, in the lesion stage, a Causality-driven GraphReasoning Module (CGRM) is designed to learn the global judgmentof doctors for lesion detection by exploring topological feature difference between the causal lesion and the non-causal organ. Finally, inthe boundary stage, a Diffusion-based Discrepancy Calibration Module(DDCM) is developed to fit the refined understanding of doctors foruncertain boundary of pancreatic cancer by inferring the ambiguoussegmentation discrepancy based on the trustworthy lesion core. Experimental results on three independent datasets demonstrate that ourapproach boosts pancreatic cancer segmentation accuracy by 4%–9%compared with the state-of-the-art methods. Additionally, the tumorvascular involvement analysis is also conducted to verify the superiorityof the proposed method in clinical applications.
在本文中,我们提出了一种新颖的类医生分割(SLAD) 框架,用于学习CT图像中胰腺及胰腺癌分割的可靠临床思维与经验。具体而言,我们的方法旨在模拟医生在胰腺癌渐进式诊断阶段的核心逻辑思维与经验:器官阶段、病灶阶段和边界阶段。首先,在器官阶段,引入解剖感知掩码自动编码器(AMAE),通过自监督预训练建模医生对CT图像中腹部器官解剖分布的整体认知。其次,在病灶阶段,设计因果驱动图推理模块(CGRM),通过探索因果病灶与非因果器官之间的拓扑特征差异,学习医生对病灶检测的全局判断。最后,在边界阶段,开发基于扩散的差异校准模块(DDCM),基于可信的病灶核心推断模糊的分割差异,拟合医生对胰腺癌不确定边界的精细化理解。在三个独立数据集上的实验结果表明,与最先进的方法相比,我们的方法将胰腺癌分割准确率提升了4%-9%。此外,还进行了肿瘤血管侵犯分析,以验证所提方法在临床应用中的优越性。
Figure
图

Fig. 1. Visual comparison of pancreatic cancers with different resectability on CTimages. (a) Resectable pancreatic cancer. (b) Borderline resectable pancreatic cancer.© Locally advanced pancreatic cancer. Pancreas and tumor are marked in red andgreen, respectively
图1. CT图像中不同可切除性胰腺癌的可视化对比 (a)可切除胰腺癌;(b)临界可切除胰腺癌;(c)局部进展期胰腺癌。其中胰腺和肿瘤分别用红色和绿色标注。

Fig. 2. Illustration of the main idea of the proposed method which aims to simulate the essential logical thinking and experience of doctors in the progressive diagnostic stagesof pancreatic cancer: organ, lesion and boundary stage. The blue, orange and green rectangles denote the vision attention of the doctors on the organ, lesion and boundary stage,respectively. Pancreas, tumor and ambiguous region around the tumor boundary are marked in red, green and blue, respectively.
图2. 所提方法的核心思想示意图 该方法旨在模拟医生在胰腺癌渐进式诊断阶段(器官、病灶、边界阶段)的核心逻辑思维与经验。蓝色、橙色和绿色矩形分别表示医生在器官阶段、病灶阶段和边界阶段的视觉关注区域。胰腺、肿瘤及肿瘤边界周围的模糊区域分别用红色、绿色和蓝色标注。

Fig. 3. Pipeline of our proposed Segment-Like-A-Doctor (SLAD) framework to learn the reliable clinical thinking and experience for pancreas and pancreatic cancer segmentation.SLAD aims to simulate the essential logical thinking of doctors in the progressive diagnostic stages of pancreatic cancer: organ, lesion and boundary stage. For the organ stage, anAnatomy-aware Masked AutoEncoder (AMAE) is introduced model the doctors’ overall cognition for the anatomical distribution of abdominal organs on CT images by self-supervisedpretraining. For the lesion stage, a Causality-driven Graph Reasoning Module (CGRM) is designed to learn the global judgment of doctors for lesion detection by exploring topologicalfeature difference between the causal lesion and the non-causal organ. As for the final boundary stage, a Diffusion-based Discrepancy Calibration Module (DDCM) is developed tofit the refined understanding of doctors for uncertain boundary of pancreatic cancer by inferring the ambiguous segmentation discrepancy based on the trustworthy lesion core
图3. 所提像医生一样分割(SLAD) 框架的流程示意图 该框架旨在学习胰腺及胰腺癌分割的可靠临床思维与经验,模拟医生在胰腺癌渐进式诊断阶段(器官、病灶、边界阶段)的核心逻辑思维: - 器官阶段:引入**解剖感知掩码自动编码器(AMAE),通过自监督预训练建模医生对CT图像中腹部器官解剖分布的整体认知; - 病灶阶段:设计因果驱动图推理模块(CGRM),通过探索因果病灶与非因果器官之间的拓扑特征差异,学习医生对病灶检测的全局判断; - 边界阶段:开发基于扩散的差异校准模块(DDCM),基于可信的病灶核心推断模糊的分割差异,拟合医生对胰腺癌不确定边界的精细化理解。 流程可视化呈现了从解剖结构认知到病灶核心定位、再到边界精细化校准的完整临床思维链条,各模块通过模拟医生诊断逻辑逐步提升分割准确性与可靠性。

Fig. 4. Specific structure and pretraining process of the Anatomy-aware MaskedAutoEncoder (AMAE)
图 4. 解剖感知掩码自动编码器(AMAE)的具体结构与预训练流程

Fig. 5. Specific structure of the Causality-driven Graph Reasoning Module (CGRM).
图 5. 因果驱动图推理模块(CGRM)的具体结构

Fig. 6. Specific structure of the Diffusion-based Discrepancy Calibration Module(DDCM).
图 6. 基于扩散的差异校准模块(DDCM)的具体结构

Fig. 7. Performance distributions of different methods on three testing datasets.
图7. 不同方法在三个测试数据集上的性能分布

Fig. 8. Qualitative comparison of segmentation of pancreases (marked in red) and tumors (marked in green). (a) and (b) are taken from internal testing dataset, © and (d) aretaken from external testing dataset I, (e) and (f) are taken from external testing dataset II. Six cases are shown in 2D slices in axial view. For each case, the Dice scores of thepancreas and tumor are presented in the red and green boxes below, respectively
图8. 胰腺(红色标注)和肿瘤(绿色标注)分割的定性对比 (a)和(b)来自内部测试数据集,(c)和(d)来自外部测试数据集I,(e)和(f)来自外部测试数据集II。图中以轴向视图的2D切片展示了六个病例。每个病例下方的红色和绿色方框中分别给出了胰腺和肿瘤的Dice系数

Fig. 9. Feature visualization of the pretrained AMAE with different training lossweights. (a) 𝑤𝐴𝑃 = 0, 𝑤𝑅𝑒𝑐 = 0; (b) 𝑤𝐴𝑃 = 0, 𝑤𝑅𝑒𝑐 = 1; © 𝑤𝐴𝑃 = 1, 𝑤𝑅𝑒𝑐 = 0; (d)𝑤𝐴𝑃 = 1, 𝑤𝑅𝑒𝑐 = 0.5; (e) 𝑤𝐴𝑃 = 1, 𝑤𝑅𝑒𝑐 = 1; (f) 𝑤𝐴𝑃 = 1, 𝑤𝑅𝑒𝑐 = 1.5
图 9. 不同训练损失权重下预训练 AMAE 的特征可视化
(a)𝑤𝐴𝑃 = 0, 𝑤𝑅𝑒𝑐 = 0;(b)𝑤𝐴𝑃 = 0, 𝑤𝑅𝑒𝑐 = 1;(c)𝑤𝐴𝑃 = 1, 𝑤𝑅𝑒𝑐 = 0;(d)𝑤𝐴𝑃 = 1, 𝑤𝑅𝑒𝑐 = 0.5;(e)𝑤𝐴𝑃 = 1, 𝑤𝑅𝑒𝑐 = 1;(f)𝑤𝐴𝑃 = 1, 𝑤𝑅𝑒𝑐 = 1.5

Fig. 10. T-SNE visualization of the vertex features obtained by different graph reasoning mechanisms on the internal testing datasets. (a) XOR-based; (b) Contour-guided; ©Causality-driven (Ours). Blue and orange dots represent the causal vertex features and the non-causal vertex features, respectively
图 10. 内部测试数据集上不同图推理机制获得的顶点特征 T-SNE 可视化(a)基于异或(XOR)的;(b)轮廓引导的;(c)因果驱动的(我们的方法)。蓝色和橙色点分别表示因果顶点特征和非因果顶点特征。

Fig. 11. Visual comparison of the ablation study of DDCM. (a) Original CT images;(b) Previous tumor segmentation (marked in red); © True discrepancy; (d) Predicteddiscrepancy without DDPM and SDF loss; (e) Predicted discrepancy without DDPM;(f) Predicted discrepancy without SDF loss; (g) Predicted discrepancy by Our DDCM.False positive and false negative discrepancies are shown in green and red in ©-(g),respectively
图 11. DDCM 模块消融研究的可视化对比
(a)原始 CT 图像;(b)先前的肿瘤分割结果(红色标注);(c)真实差异;(d)不含 DDPM 和 SDF 损失的预测差异;(e)不含 DDPM 的预测差异;(f)不含 SDF 损失的预测差异;(g)我们的 DDCM 预测的差异。(c)-(g)中假阳性和假阴性差异分别用绿色和红色表示。

Fig. 12. Performance comparison in assessment of degrees of vascular involvement for five main peri-pancreatic vessels. CA: celiac artery; SMA: superior mesenteric artery, CHA:common hepatic artery; SMV: superior mesenteric vein; PV: portal vein.
图 12. 五种主要胰周血管的血管侵犯程度评估性能对比CA:腹腔干;SMA:肠系膜上动脉;CHA:肝总动脉;SMV:肠系膜上静脉;PV:门静脉
Table
表

Table 1Detailed information of the utilized datasets in this work. The numbers in ( ) represent the corresponding quantity of CT scans in eachdataset. The numbers in [ ] stand for the volume sizes of the pancreatic tumor [mean ± std, cm3 ] in the downstream segmentationdatasets
表1 本研究中使用的数据集详细信息 括号中的数字表示每个数据集中CT扫描的数量,方括号中的数字表示下游分割数据集中胰腺肿瘤的体积大小[平均值±标准差,cm³]

Table 2Quantitative comparison of segmentation and detection performance on three independent testing datasets. PS, TD and TS denote pancreas segmentation, tumor detection andtumor segmentation, respectively. Dice scores (mean ± std, %) and 10% detection rates (%) are presented as segmentation and detection metrics, respectively. The best and secondbest performance are in red and blue, respectively.
表 2 三个独立测试数据集上的分割与检测性能定量对比PS、TD 和 TS 分别表示胰腺分割、肿瘤检测和肿瘤分割。分割指标采用 Dice 系数(平均值 ± 标准差,%),检测指标采用 10% 检测率(%)。最优和次优性能分别以红色和蓝色标注。

Table 3Ablation study of the proposed SLAD method for pancreatic cancer segmentation on three independent testing datasets. Dice coefficient similarity (mean ± std, %) and 10%detection rates (DR, %), 95% Hausdorff Distance (HD,mm) are presented. The best and second best performance are in red and blue, respectively
表 3 所提 SLAD 方法在三个独立测试数据集上的胰腺癌分割消融研究呈现 Dice 系数相似度(平均值 ± 标准差,%)、10% 检测率(DR,%)、95% 豪斯多夫距离(HD,mm)。最优和次优性能分别以红色和蓝色标注。

Table 4Ablation study of the AMAE for pancreatic cancer segmentation on three independent testing datasets. Dice coefficient similarity (mean ± std, %) and 10% detection rates (DR, %),95% Hausdorff Distance (HD,mm) are presented. The best and second best performance are in red and blue, respectively. 𝑤𝐴𝑃 = 0 and 𝑤𝑅𝑒𝑐 = 0 denote the supervised baselinewithout self-supervised pretraining
表4 解剖感知掩码自动编码器(AMAE)在三个独立测试数据集上的胰腺癌分割消融研究 呈现Dice系数相似度(平均值±标准差,%)、10%检测率(DR,%)、95%豪斯多夫距离(HD,mm)。最优和次优性能分别以红色和蓝色标注。𝑤**𝐴𝑃 = 0和𝑤**𝑅𝑒𝑐 = 0表示无自监督预训练的监督基线。

Table 5Performance comparison of different reasoning mechanisms for pancreatic cancer segmentation on three independent testing datasets. Dice coefficient similarity (mean ± std, %)and 10% detection rates (DR, %), 95% Hausdorff Distance (HD, mm) are presented. The best and second best performance are in red and blue, respectively. GR stands for graphreasoning.
表5 三个独立测试数据集上不同推理机制的胰腺癌分割性能对比 呈现Dice系数相似度(平均值±标准差,%)、10%检测率(DR,%)、95%豪斯多夫距离(HD,mm)。最优和次优性能分别以红色和蓝色标注。GR表示图推理

Table 6Ablation study of the components in DDCM for pancreatic cancer segmentation on three independent testing datasets. Dice coefficient similarity (mean ± std, %) and 10% detectionrates (DR, %), 95% Hausdorff Distance (HD,mm) are presented. The best and second best performance are in red and blue, respectively
表 6 所提 DDCM 模块各组件在三个独立测试数据集上的胰腺癌分割消融研究呈现 Dice 系数相似度(平均值 ± 标准差,%)、10% 检测率(DR,%)、95% 豪斯多夫距离(HD,mm)。最优和次优性能分别以红色和蓝色标注。

Table 7Performance comparison of evaluation the pancreatic tumor-vascular involvement onthree independent testing datasets. The accuracy metrics for resectability prediction ofresectable, borderline resectable or locally advanced pancreatic cancer are presented.The best performance are shown in bold.
表7 三个独立测试数据集上胰腺肿瘤-血管侵犯评估的性能对比 呈现可切除、临界可切除或局部进展期胰腺癌的可切除性预测准确率指标,最优性能以粗体显示。