Title
题目
Cross-site prognosis prediction for nasopharyngeal carcinoma from incomplete multi-modal data
基于不完整多模态数据的跨站点鼻咽癌预后预测
01
文献速递介绍
鼻咽癌(NPC)是一种在亚洲和非洲地理上具有显著分布优势的头颈部癌症(Sung等,2021;Mody等,2021;Zhang等,2019)。对于局部晚期鼻咽癌患者,诱导化疗(ICT)后接合并放化疗(CCRT)是标准治疗方案(Chen等,2021a,2019;Ng等,2018)。在这种策略中,CCRT伴随着较高的急性毒性发生率,例如黏膜炎和血液毒性。为了确保治疗的安全性和有效性,仔细调整CCRT的治疗强度至关重要(Chen等,2021b;Setton等,2016;Cho等,2020)。如图1(a)所示,通过在CCRT之前准确预测患者的预后,临床医生可以根据预测的复发或死亡风险量身定制治疗强度。例如,高复发或死亡风险的患者可能需要更为强化的治疗,而低风险的患者则可能适合强度较低的治疗,以避免不必要的副作用。因此,及时准确的预后预测可以指导治疗强度,从而在毒性和风险之间取得平衡。
由于能够良好地可视化鼻咽和颈部的肿瘤和淋巴结,医学影像检查已成为鼻咽癌诊断的常规临床程序(Amin等,2017)。一些研究尝试开发基于影像驱动的方法,以更好地进行计算机辅助的疾病分析(Tao等,2022;Li等,2022a;Jing等,2020;Qiang等,2021;Zhang等,2021)。例如,Tao等(2022)提出了一种序列化方法,以实现磁共振(MR)图像中准确的鼻咽癌分割,这在肿瘤分期和放射治疗计划中起着关键作用。最近,一种基于深度学习的方法被提出,可以同时分割原发肿瘤和转移性淋巴结(Li等,2022a)。Dong等(2019)从治疗前的MR图像中提取放射组学特征,并尝试识别新的个性化生物标志物。Bai等(2021)提出了一种新颖的从位置到分割的框架,用于鼻咽癌的分割,其表现优于广泛使用的U-Net,并在挑战排行榜中获得第9名。此外,还采用了三维卷积神经网络来学习MR图像特征,以揭示局部晚期鼻咽癌患者的变异性(Qiang等,2021)。
Aastract
摘要
Accurate prognosis prediction for nasopharyngeal carcinoma based on magnetic resonance (MR) images assistin the guidance of treatment intensity, thus reducing the risk of recurrence and death. To reduce repeated laborand sufficiently explore domain knowledge, aggregating labeled/annotated data from external sites enablesus to train an intelligent model for a clinical site with unlabeled data. However, this task suffers from thechallenges of incomplete multi-modal examination data fusion and image data heterogeneity among sites. Thispaper proposes a cross-site survival analysis method for prognosis prediction of nasopharyngeal carcinomafrom domain adaptation viewpoint. Utilizing a Cox model as the basic framework, our method equips itwith a cross-attention based multi-modal fusion regularization. This regularization model effectively fuses themulti-modal information from multi-parametric MR images and clinical features onto a domain-adaptive space,despite the absence of some modalities. To enhance the feature discrimination, we also extend the contrastivelearning technique to censored data cases. Compared with the conventional approaches which directly deploya trained survival model in a new site, our method achieves superior prognosis prediction performance incross-site validation experiments. These results highlight the key role of cross-site adaptability of our methodand support its value in clinical practice
基于磁共振(MR)图像的准确鼻咽癌预后预测有助于指导治疗强度,从而减少复发和死亡的风险。为了减少重复劳动并充分利用领域知识,聚合来自外部站点的标注数据使我们能够为缺少标注数据的临床站点训练智能模型。然而,这一任务面临着多模态检查数据融合不完整以及站点之间图像数据异质性的问题。本文从领域适应的角度提出了一种用于鼻咽癌预后预测的跨站点生存分析方法。该方法以Cox模型为基础框架,并配备了基于交叉注意力的多模态融合正则化。尽管某些模态缺失,这种正则化模型能够有效地将多参数MR图像和临床特征的信息融合到一个领域自适应空间中。为了增强特征的区分性,我们还将对比学习技术扩展到删失数据的场景中。与传统方法直接将训练的生存模型应用于新站点不同,我们的方法在跨站点验证实验中实现了更优的预后预测性能。这些结果凸显了我们方法的跨站点适应性关键作用,并支持其在临床实践中的价值。
Method
方法
The proposed SiteAda method is a site-adaptive prognosis predictionmodel, with two regularization terms, under the unsupervised domainadaptation framework. We briefly introduce the Cox proportional hazards model and its deep learning based variants, and then present thetwo novel regularizers.
所提出的SiteAda方法是一种站点自适应的预后预测模型,基于无监督领域适应框架,包含两个正则化项。我们将简要介绍Cox比例风险模型及其基于深度学习的变体,随后介绍这两个新颖的正则化器。
Conclusion
结论
In this paper, we present a cross-site prognosis prediction model,SiteAda, for NPC to simultaneously achieve incomplete multi-modallearning and effectively relieve distribution mismatch by a cross-attention based multi-modal fusion regularization. The proposed regularization model fuses multiple modalities from the information decodingaspect and thus is accessible even if some modalities are missing. Italso builds a cross-domain attention mechanism for learning domaintransformed features and facilitates subsequent semantic knowledgemigration. Moreover, a cross-domain partial contrastive learning module is proposed to enhance the discrimination ability.We conduct extensive experiments among three clinical sites, andthe SiteAda method presents significant improvements over the conventional approaches which directly test a trained prognosis predictionmodel in a new site. Thus, this study will be of interest to the cliniciansand researchers working on building a survival model for any new sitewithout a heavy workload and considerable time on labeling. Combining those time-varying information from the follow-up examination anddeveloping an automatic segmentation algorithm for tumors and lymphnodes into the flowchart will be our future work
在本文中,我们提出了一种用于鼻咽癌(NPC)的跨站点预后预测模型SiteAda,该模型能够同时实现不完整的多模态学习,并通过基于交叉注意力的多模态融合正则化有效缓解分布不匹配问题。所提出的正则化模型从信息解码的角度融合了多种模态,因此即使某些模态缺失,也能够正常工作。它还构建了一个跨域注意力机制,用于学习域转换特征,并促进后续语义知识迁移。此外,提出了一个跨域部分对比学习模块,以增强区分能力。
我们在三个临床站点间进行了广泛的实验,结果表明SiteAda方法相比传统的直接在新站点测试训练好的预后预测模型的方法有显著的改进。因此,本研究对那些致力于在新站点构建生存模型的临床医生和研究人员具有重要意义,能够减少大量标注工作和时间成本。结合随访检查中的时间变化信息,并开发用于肿瘤和淋巴结的自动分割算法,将是我们未来的研究方向。
Figure
图
Fig. 1. Challenges in cross-site prognosis prediction for NPC from incomplete multi-modal data. (a) Incomplete multi-modal data collected from a clinical site. Each patient istreated through a two-stage strategy, namely induction chemotherapy (ICT) followed by concurrent chemoradiotherapy (CCRT). The multi-modal data are collected at the timepoints before/after ICT. Due to the equipment availability and individualized treatment, not all NPC patients undergo the same and complete examinations. This requires anapproach that can not only fuse the heterogeneous information but also tackle the incomplete multi-modal problem. (b) Examples of NPC magnetic resonance images and intensityhistograms from different clinical sites. Top to bottom: two sites. Left to right: different risks to relapse or death and the gray histograms of images. The data between sites showheterogeneity on appearance and contrast, leading to the domain shift problem for cross-site modeling
图1. 从不完整的多模态数据进行鼻咽癌(NPC)跨站点预后预测所面临的挑战。(a) 来自临床站点的不完整多模态数据。每位患者通过两阶段策略进行治疗,即诱导化疗(ICT)后接合并放化疗(CCRT)。多模态数据在ICT前/后时间点收集。由于设备的可用性和个性化治疗的原因,并非所有鼻咽癌患者都接受相同且完整的检查。这需要一种既能融合异构信息又能解决不完整多模态问题的方法。(b) 不同临床站点的鼻咽癌磁共振图像和强度直方图示例。从上到下:两个站点。从左到右:不同复发或死亡风险的示例及图像的灰度直方图。站点间的数据在外观和对比度上存在异质性,导致了跨站点建模中的领域偏移问题。
Fig. 2. The SiteAda method and the details of key components for cross-site modeling. (a) Flowchart of our SiteAda method. It uses a cross-attention based multi-modal fusionmodule to learn the domain-invariant fused features from the collected incomplete multi-modal data. Additionally, a cross-domain partial contrastive learning module is designedto enhance the discrimination. The ultimate task, prognosis prediction, is learned by minimizing a negative partial log-likelihood function. Since SiteAda learns fusion and domaininvariant features across sites, the prognosis prediction model can be transferred from a labeled site (source domain) to an unlabeled site (target domain). (b) Cross-attentionbased multi-modal fusion module. It learns fused features in a decoding manner by ensuring each single-modal data can be reconstructed from the fused feature, and builds adomain-level attention mechanism for cross-site knowledge transfer. © Cross-domain partial contrastive learning module. It tries to improve the discriminant ability by orderingthe embeddings according to the risk stratification.
图2. SiteAda方法及跨站点建模的关键组件细节。(a) SiteAda方法的流程图。该方法使用基于交叉注意力的多模态融合模块,从收集的不完整多模态数据中学习域不变的融合特征。此外,设计了一个跨域部分对比学习模块以增强区分能力。最终的预后预测任务通过最小化部分负对数似然函数来进行学习。由于SiteAda在站点之间学习了融合和域不变特征,预后预测模型可以从标注站点(源域)转移到未标注站点(目标域)。(b) 基于交叉注意力的多模态融合模块。它通过解码方式学习融合特征,确保每个单模态数据可以从融合特征中重建,并建立一个域级注意力机制来进行跨站点知识转移。© 跨域部分对比学习模块。该模块通过根据风险分层对嵌入进行排序来提高区分能力。
Fig. 3. Performance in survival curve estimation and risk stratification. (a) Comparisons of different mean survival curves with the Kaplan–Meier curve for observed patients intarget domain on transfer task Site-1→Site-2. (b) Survival probability on the target samples stratified by low and high risk via different methods. We also show the log-rank test𝑝-value in each chart
图3. 生存曲线估计和风险分层的性能。(a) 在迁移任务Site-1→Site-2中,不同方法的平均生存曲线与目标域中观测到的患者的Kaplan–Meier曲线的比较。(b) 通过不同方法在目标样本上按高风险和低风险进行分层的生存概率。我们还在每个图表中展示了log-rank检验的𝑝值。
Fig. 4. Heat-maps for the feature similarities. Left: original features. Right: discriminative features learned by our SiteAda method. Top to bottom: similarities within sourcedomain, target domain, and across domains. Compared with the original features, thesimilarity of learned features with comparable risks (the region between two dottedlines) becomes larger, whereas that with wide disparities of risk (the remaining region)becomes smaller
图4. 特征相似性的热图。左图:原始特征。右图:通过我们SiteAda方法学习到的区分性特征。自上而下分别为:源域内、目标域内以及跨域之间的相似性。与原始特征相比,学习到的特征在相似风险下(两条虚线之间的区域)的相似性变大,而在风险差异较大的区域(剩余区域)相似性减小。
Fig. 5. The tSNE visualization of feature representations learned by SiteAda method and further analysis of some subjects. (a) The tSNE. The features from two sites areindistinguishable, implying that SiteAda successfully finds a domain-invariant space. To have an intuitive understanding of what semantic information the embedding spacepossesses, we take three subjects (subjects-A, B, and C) for further analysis. Note that Subject-A and Subject-B come from different sites but their features are nearby, whereasSubject-B and Subject-C come from the same site but their features are far away from each other. (b) The survival curves. The Subject-A and Subject-B show similar survivalcurves, and Subject-C presents a curve distinct from the former two, illustrating that the learned features are hierarchically organized in the space. © Multi-modal examinationdata. The yellow dashed lines delineate the ROI boundaries before induction chemotherapy (ICT) and the red-shaded regions represent the ROI. The Subject-A and Subject-B havean analogous trend that the primary tumor still becomes bigger after ICT, and Subject-C displays a contrary trend that both primary tumor and metastatic lymph node becomesmaller after ICT.
图5. SiteAda方法学习到的特征表示的tSNE可视化及一些受试者的进一步分析。(a) tSNE。来自两个站点的特征是不可区分的,这表明SiteAda成功找到了一个域不变的空间。为了直观理解嵌入空间中包含的语义信息,我们对三个受试者(A、B和C)进行了进一步分析。注意,受试者A和受试者B来自不同站点,但它们的特征相近,而受试者B和受试者C来自同一站点,但它们的特征相距较远。(b) 生存曲线。受试者A和受试者B的生存曲线相似,而受试者C的生存曲线与前两者明显不同,说明学习到的特征在空间中是分层组织的。© 多模态检查数据。黄色虚线勾画了诱导化疗(ICT)前的感兴趣区域(ROI)边界,红色阴影区域表示ROI。受试者A和受试者B显示出类似的趋势,即在ICT后原发肿瘤仍然变大,而受试者C则呈现出相反的趋势,即原发肿瘤和转移性淋巴结在ICT后都变小了。
Fig. 6. Convergence and hyper-parameter robustness. (a) Convergence. It can beobserved that the objective function value fast converges and the performance metricsbecome stable since 200 epochs. (b) Hyper-parameter robustness. When the two hyperparameters have relatively large values, the C-index fluctuates slightly, validating thehyper-parameter robustness of SiteAda method.
图6. 收敛性和超参数稳健性。(a) 收敛性。可以观察到目标函数值快速收敛,并且自200个周期后性能指标变得稳定。(b) 超参数稳健性。当两个超参数的值相对较大时,C-index仅有轻微波动,验证了SiteAda方法的超参数稳健性。
Fig. 7. AUC performance versus time from diagnosis of the SiteAda method usingdifferent types of features. ‘‘Modal-1’’ to ‘‘Modal-4’’ are MR images of head scanbefore ICT, neck scan before ICT, head scan after ICT, and neck scan after ICT,respectively. ‘‘Modal-1′ ’’ and ‘‘Modal-2′ ’’ are the difference of radiomic features on headand neck scans, respectively. ‘‘Modal-5’’ to ‘‘Modal-7’’ are clinical features. ‘‘Modal-1 ′ ,2′ ’’ (both differences of radiomic features), ‘‘Modal-3,4’’ (both scans after ICT),‘‘Modal-1′ ,3’’ (both head scans), and ‘‘Modal-2′ ,4’’ (both neck scans) represent thepairwise combinations of MR images. ‘‘Modal-1′ ,2′ ,3,4’’ and ‘‘Modal-5,6,7’’ are twomajor groups: MR image group and clinical feature group. ‘‘Modal-1,2,5,6,7’’ includesboth scans before ICT and the clinical features. ‘‘All-modal fusion’’ means that allmodalities are used.
图7. 使用不同类型特征的SiteAda方法的AUC表现随诊断时间变化的表现。“Modal-1”到“Modal-4”分别为ICT前的头部扫描MR图像、ICT前的颈部扫描MR图像、ICT后的头部扫描MR图像和ICT后的颈部扫描MR图像。“Modal-1′”和“Modal-2′”分别是头部和颈部扫描的放射组学特征差异。“Modal-5”到“Modal-7”是临床特征。“Modal-1′,2′”(放射组学特征的差异)、“Modal-3,4”(ICT后的扫描)、“Modal-1′,3”(头部扫描)、“Modal-2′,4”(颈部扫描)代表MR图像的成对组合。“Modal-1′,2′,3,4”和“Modal-5,6,7”是两大组:MR图像组和临床特征组。“Modal-1,2,5,6,7”包括了ICT前的扫描和临床特征。“All-modal fusion”表示使用了所有模态数据。
Table
表
Table 1Detailed information of datasets.
表1数据集的详细信息。
Table 2Comparisons against competing methods on cross-site NPC prognosis prediction. ‘‘Site-∗ → Site-⋇’’ represents a transfer task from ‘‘Site-∗’’ (source domain) to ‘‘Site-⋇’’ (targetdomain). Three strategies (zero-filling followed by concatenation-based fusion, mean-filling followed by concatenation-based fusion, and reconstruction-based fusion algorithm) areadopted to tackle the incomplete multi-modal fusion problem. Under each strategy, the pure DeepSurv (Qiang et al., 2021; Mobadersany et al., 2018) and its variants with varioussite adaptation regularizers (MMD (Long et al., 2015), DA (Ganin et al., 2016), CDA (Long et al., 2018), CKB (Luo and Ren, 2021), and ATM (Li et al., 2021)) are conducted.We report the average evaluation metrics with standard errors in the superscript. It can be observed that our SiteAda method outperforms all the three groups of approaches inall performance metrics
表2跨站点鼻咽癌预后预测中与竞争方法的比较。“Site-∗ → Site-⋇” 表示从"Site-∗"(源域)到"Site-⋇"(目标域)的迁移任务。为解决不完整多模态融合问题,采用了三种策略(零填充后基于拼接的融合、均值填充后基于拼接的融合以及基于重构的融合算法)。在每种策略下,进行了纯DeepSurv(Qiang等,2021;Mobadersany等,2018)及其与各种站点适应正则化器(MMD (Long等,2015)、DA (Ganin等,2016)、CDA (Long等,2018)、CKB (Luo和Ren,2021) 和ATM (Li等,2021))的变体对比实验。我们报告了带有标准误的平均评估指标。可以观察到,我们的SiteAda方法在所有性能指标上均优于这三组方法。
Table 3Ablation study of SiteAda method on the transfer task Site-1→Site-2
表3SiteAda方法在迁移任务Site-1→Site-2上的消融研究结果。
Table A.1Recognition rates (%) on Image-CLEF-DA. We report the average accuracy with standard error in the superscript.
表A.1Image-CLEF-DA上的识别率 (%)。我们报告了平均准确率以及上标表示的标准误差。
Table A.2Recognition rates (%) on Office-Home. We report the average accuracy with standard error in the superscript.
表A.2Office-Home数据集上的识别率 (%)。我们报告了平均准确率以及上标表示的标准误差。
Table A.3P-values of statistical comparisons between the results of SiteAda and those of referredmethods.
表A.3SiteAda方法与参考方法结果的统计比较的p值。