Ensemble transformer-based multiple instance learning to predictpathological subtypes and tumor mutational burden from histopathologicalwhole slide images of endometrial and colorectal cancer
In endometrial cancer (EC) and colorectal cancer (CRC), in addition to microsatellite instability, tumormutational burden (TMB) has gradually gained attention as a genomic biomarker that can be used clinicallyto determine which patients may benefit from immune checkpoint inhibitors. High TMB is characterized by alarge number of mutated genes, which encode aberrant tumor neoantigens, and implies a better response toimmunotherapy. Hence, a part of EC and CRC patients associated with high TMB may have higher chancesto receive immunotherapy. TMB measurement was mainly evaluated by whole-exome sequencing or nextgeneration sequencing, which was costly and difficult to be widely applied in all clinical cases. Therefore, aneffective, efficient, low-cost and easily accessible tool is urgently needed to distinguish the TMB status of ECand CRC patients. In this study, we present a deep learning framework, namely Ensemble Transformer-basedMultiple Instance Learning with Self-Supervised Learning Vision Transformer feature encoder (ETMIL-SSLViT),to predict pathological subtype and TMB status directly from the H&E stained whole slide images (WSIs) inEC and CRC patients, which is helpful for both pathological classification and cancer treatment planning. Ourframework was evaluated on two different cancer cohorts, including an EC cohort with 918 histopathologyWSIs from 529 patients and a CRC cohort with 1495 WSIs from 594 patients from The Cancer Genome Atlas.The experimental results show that the proposed methods achieved excellent performance and outperformingseven state-of-the-art (SOTA) methods in cancer subtype classification and TMB prediction on both cancerdatasets. Fisher’s exact test further validated that the associations between the predictions of the proposedmodels and the actual cancer subtype or TMB status are both extremely strong (𝑝 < 0.001). These promisingfindings show the potential of our proposed methods to guide personalized treatment decisions by accuratelypredicting the EC and CRC subtype and the TMB status for effective immunotherapy planning for EC and CRCpatients.
在本研究中,我们提出了一种深度学习框架——集成Transformer的多实例学习与自监督学习视觉Transformer特征编码器(ETMIL-SSLViT),用于直接从EC和CRC患者的H&E染色全切片图像(WSI)中预测病理亚型和TMB状态,这对于病理分类和癌症治疗计划都具有重要意义。我们的框架在两个不同的癌症队列中进行了评估,包括来自癌症基因组图谱(TCGA)中,来自529名患者的918张EC组织病理WSI和来自594名患者的1495张CRC组织病理WSI。实验结果表明,所提出的方法在癌症亚型分类和TMB预测方面取得了优异的表现,并且在两个癌症数据集上优于七种现有的先进方法(SOTA)。费舍尔精确检验进一步验证了所提出的模型的预测与实际癌症亚型或TMB状态之间的关联极为强烈(𝑝 < 0.001)。这些有前景的结果显示,我们提出的方法在通过准确预测EC和CRC亚型及TMB状态来指导个性化治疗决策方面具有潜力,从而为EC和CRC患者的免疫治疗规划提供有效支持。
This study introduces a DL method, namely Ensemble Transformerbased Multiple Instance Learning with Self-Supervised Learning VisionTransformer feature encoder (ETMIL-SSLViT), to predict pathologicalsubtype and TMB status directly from the H&E stained WSIs in EC andCRC patients. All the slides were directly downloaded from the TCGAplatform. For data pre-processing, as in 2024, Faryna et al. (2024) havecompared four SOTA automatic augmentation methods from generalcomputer vision and investigated their capacity to improve domaingeneralization in histopathology, showing that RandAugment (Cubuket al., 2020) has as a simple way to get state-of-the-art performancein histopathology, therefore the proposed methods were tested withand without the data augmentation pre-process using RandAugment.Firstly, we built a Vision Patch Segmentation Module (VPSM) as shownin Section 3.1 to rapidly extract non-overlapping foreground patches,which helps enhance efficiency and accuracy in WSI analysis (Fig. 1(a)).Secondly, we presented a Self-Supervised Learning Vision TransformerFeature Encoder Module (SSLViT-FEM) in Section 3.2 that integratesa pre-trained ViT-S/16 with SSL techniques to extract features fromWSIs (Fig. 1(b)). SSLViT-FEM captures global salient features of imagesto resolve long-range connections between the content of images andfully utilizes the attention mechanism to incorporate global contextinformation into image features, enhancing the accuracy of the extracted features without significantly increasing computational cost.Thirdly, we presented a Transformer-based Multiple Instance Learningmodel (TMIL) in Section 3.3 to address the issue that traditional MILmethods often assume that instances are independent and identicallydistributed (i.i.d.), which neglects the correlations among instances. Inthe proposed TMIL, each WSI is treated as a bag, and patches extractedfrom the WSI are treated as instances (Fig. 1(d)). Unlike traditional MILmethods, our proposed TMIL utilized the self-attention mechanism ofTransformers to model the relationships between instances. The selfattention mechanism allows the model to assign different attentionweights to each instance, effectively capturing the dependencies andinteractions between them. This mechanism ensures that each instancecan be influenced by others, fully reflecting the correlations and interactions among instances. This framework enhances the representationby integrating both morphological and spatial information from theinstances. The results in this study shows that our proposed methodoutperforms all the latest SOTA MIL methods (Lu et al., 2021a; Xianget al., 2023; Campanella et al., 2019; Lu et al., 2021b) (see Tables 1 and2). Fourthly, an Early Stop Mechanism (ESM) as shown in Section 3.4was built based on the Cross Entropy loss to help prevent overfittingand save computational resources and time as Cross Entropy measuresthe discrepancy between the predicted probability distribution by themodel and the actual label distribution, thereby directly reflectingthe model’s effectiveness in predicting categories (Fig. 1(c.ii)). Fifthly,we devised an Ensemble Framework (EF) using the bagging strategywith a Two-stage Optimal Model Finder method (T-OMF) as shownin Sections 3.5 and 3.6, respectively. The proposed ensemble improves variance reduction, predictive performance, model robustnessand reduces overfitting.
Our framework was evaluated on two different cancer cohorts,including 918 histopathology WSIs of 529 EC patients and 1495 WSIsof 594 CRC patients from TCGA, for both prediction of cancer subtypesand TMB status (see Section 4.1 and Fig. 4). The evaluation wasconducted in three parts. Firstly, we compared the proposed methodsin cancer subtyping and TMB prediction in EC and CRC cohorts withseven SOTA DL methods, which have achieved remarkably success inthe field of computational pathology, including ClassicMIL (Campanellaet al., 2019), Wang et al. (2023d), Improved_InceptionV3_MS (Wanget al., 2023e), CLAM (Lu et al., 2021b), TOAD (Lu et al., 2021a), TransMIL (Shao et al., 2021), and MRAN (Lu et al., 2021a). All the resultsshow that the proposed methods achieved excellent performances andoutperformed seven SOTA methods in cancer subtype classification andTMB prediction on both cancer datasets (see Section 4.2, Tables 1 and2).Section 4.2.7 demonstrates the interpretability of the proposedmethod in application of TMB prediction in EC and CRC samples. Ourproposed models predict slides by identifying and focusing on regionsof WSIs that can predict whether the tumor has a high mutationalburden (high attention score) and disregarding regions with low relevance for TMB prediction in two datasets, including CRC and EC sampleslides, respectively (see Fig. 5(a) and (b)). Importantly, our proposedmodels are able to differentiate TMB traits using weakly supervisedlearning with slide-level labels, despite not getting specific pixel- orpatch-level annotation during training.In Section 4.3, six ablation studies were performed to examinethe efficacy of two components in the proposed ETMIL framework,including comparisons of different (1) model assessment metrics in theproposed T-OMF module (see Table 3), (2) feature encoders to buildSelf-Supervised Learning Vision Transformer Feature Encoder Module(SSLViT-FEM) (see Table 4), (3) SSL-based backbones (see Table 5), (4)optimizers for model training (see Table 7), (5) loss functions for modeltraining (see Table 8), and (6) assessment of the proposed methodcapacity for generalization using five different datasets (see Table 9).
我们的框架在两个不同的癌症队列中进行了评估,分别包括来自TCGA的529名EC患者的918张组织病理WSI和594名CRC患者的1495张WSI,用于癌症亚型和TMB状态的预测(参见第4.1节和图4)。评估分为三个部分。首先,我们将提出的方法与七种最先进的深度学习(SOTA)方法进行了对比,这些方法在计算病理学领域取得了显著成功,包括ClassicMIL (Campanella et al., 2019)、Wang et al. (2023d)、Improved_InceptionV3_MS (Wang et al., 2023e)、CLAM (Lu et al., 2021b)、TOAD (Lu et al., 2021a)、TransMIL (Shao et al., 2021)和MRAN (Lu et al., 2021a)。所有结果显示,所提出的方法在癌症亚型分类和TMB预测方面均取得了优秀的表现,并且在这两个癌症数据集上优于七种最先进的方法(参见第4.2节,表1和表2)。
在第4.3节中,我们进行了六项消融研究,评估了ETMIL框架中两个关键组件的有效性,包括:(1)提出的T-OMF模块中不同模型评估指标的比较(参见表3);(2)用于构建自监督学习Vision Transformer特征编码器模块(SSLViT-FEM)的特征编码器的比较(参见表4);(3)基于SSL的骨架比较(参见表5);(4)模型训练的优化器比较(参见表7);(5)模型训练的损失函数比较(参见表8);(6)使用五个不同数据集评估所提出方法的泛化能力(参见表9)。
Fig. 1. Overview of the proposed Ensemble Transformer-based Multiple Instance Learning with Self-Supervised Learning Vision Transformer feature encoder (ETMIL-SSLViT): (a) aVision Patch Segmentation Module (VPSM). (b) Self-Supervised Learning Vision Transformer Feature Encoder Module (SSLViT-FEM). © Ensemble Framework (EF) with Two-stageOptimal Model Finder (T-OMF). (c.i) Stage 1 OMF. (c.ii) Early Stop Mechanism (ESM). (c.iii) Stage 2 OMF. (d) Transformer-based Multiple Instance Learning (TMIL).
图1. 提出的集成Transformer基础的多实例学习与自监督学习视觉Transformer特征编码器(ETMIL-SSLViT)的概述:(a) 视觉补丁分割模块(VPSM)。(b) 自监督学习视觉Transformer特征编码器模块(SSLViT-FEM)。© 集成框架(EF)与双阶段最优模型查找器(T-OMF)。(c.i) 第一阶段OMF。(c.ii) 早停机制(ESM)。(c.iii) 第二阶段OMF。(d) 基于Transformer的多实例学习(TMIL)。
Fig. 2. Area Under the Receiver Operating Characteristic curves (AUROC curves) for assessment of (a) EC subtype classification (aggressive vs non-aggressive), (b) TMB prediction in the aggressive EC subtype © TMB prediction in the non-aggressive EC subtype.
图 2. 受试者工作特征曲线下面积(AUROC 曲线)评估:(a) 子宫内膜癌(EC)亚型分类(侵袭性 vs 非侵袭性),(b) 侵袭性子宫内膜癌亚型中的肿瘤突变负荷(TMB)预测,© 非侵袭性子宫内膜癌亚型中的肿瘤突变负荷(TMB)预测。
Fig. 3. Area Under the Receiver Operating Characteristic curves (AUROC curves) for assessment of (a) CRC subtype classification (mucinous vs non-mucinous), (b) TMB prediction in the non-mucinous CRC subtype, © TMB prediction in the mucinous CRC subtype.
图 3. 用于评估的受试者工作特征曲线下面积(AUROC 曲线): (a) CRC亚型分类(粘液性 vs 非粘液性), (b) 非粘液性 CRC亚型中的TMB预测, © 粘液性 CRC亚型中的TMB预测。
Fig. 4. Data information of two type cancer datasets. (a) TCGA EC cohort and CRC cohort of the data, (b) Image diversity of the data, © Subtypes distribution, (d) Lengthdistribution in pixels, (e) Race distribution and (f) Age distribution.
图 4. 两种癌症数据集的数据概览。(a) TCGA内膜癌(EC)队列和结直肠癌(CRC)队列数据;(b) 数据的图像多样性;© 亚型分布;(d) 像素长度分布;(e) 种族分布;(f) 年龄分布。
Fig. 5. Model attention heatmaps in prediction of (a) CRC TMB and (b) EC TMB.
图 5. 模型在预测 (a) CRC TMB 和 (b) EC TMB 时的注意力热图。
Table 1Evaluation in Cancer Subtyping and TMB prediction of EC.
表1. 子宫内膜癌(EC)的癌症亚型分类和肿瘤突变负荷(TMB)预测评估.
Table 2Evaluation in Cancer Subtyping and TMB prediction of CRC
表 2 CRC(结直肠癌)亚型分类和TMB预测的评估
Table 3Quantitative evaluation to compare model selection mechanism in classification of EC subtypes.
Table 3 定量评估:用于比较EC亚型分类中模型选择机制的性能。
Table 4Comparison of the performance of the proposed methods using different feature extractor methods in EC samples.
表 4 使用不同特征提取方法在 EC 样本中的性能比较。
Table 5Comparison of the proposed framework with various SSL-based backbones in classification of EC subtypes.
表 5 提出框架与各种基于自监督学习(SSL)的骨干网络在 EC 亚型分类中的性能比较。
Table 6Run time analysis of the proposed framework using different SSL-based backbones
表 6 提出框架在使用不同基于自监督学习(SSL)的骨干网络时的运行时间分析。
Table 7Comparison of the proposed method with different optimizer in classification of EC subtypes.
表 7 提出方法与不同优化器在EC亚型分类中的比较。
Table 8Comparison of the proposed method with different loss function in classification of EC subtypes
表 8 提出方法与不同损失函数在EC亚型分类中的比较。
Table 9Evaluation of the proposed methods on five independent source sites in classification of EC subtypes
表 9 提出方法在五个独立数据源上的评估,针对EC亚型分类。