Title
题目
Deep learning for lung cancer prognostication: A retrospective multi-cohort radiomics study
肺癌预后的深度学习:一项回顾性多队列放射学研究
01
文献速递介绍
癌症的不断演变和与周围环境的相互作用不断挑战着患者、临床医生和研究人员。其中最致命的形式之一出现在肺部,导致全球最多的与癌症相关的死亡率。肺癌是男性和女性中第二常见的诊断癌症,其中非小细胞肺癌(NSCLC)占了85%的病例。准确地将NSCLC患者分组到围绕临床因素构建的组中,代表着癌症护理中的关键一步。这种分层允许评估肿瘤进展,建立预后,为有效的临床沟通提供标准术语,最重要的是,从化疗和手术到放疗和靶向治疗,确定适当的治疗计划。除了临床因素(包括生理状态)以及在较小程度上的年龄和性别之外,肿瘤分期——通过主要的肿瘤淋巴结转移(TNM)分期手册评估——通常被视为进行这种分层的通用基准。
主要的TNM分期手册代表了结合了临床研究的基于证据的发现和来自特定部位专家的经验知识的知识体系。然而,我们发现在相同分期内的患者在对待方式上可能存在广泛的差异。这部分是由于昨天的统计数据与今天更先进的治疗选择之间不可避免的差距,以及在平衡临床医生识别分层特征并在护理点应用分层算法的能力的同时,将患者分成符合历史数据的组之间的实际挑战。我们的临床黄金标准的局限性,结合我们对肿瘤内异质性的改进理解,表明需要开发可以在个体患者水平上运作的个性化生物标志物,而不是在人群水平上,最终导致更健壮的患者分层,并为精准肿瘤学实践奠定基础。
Abstract
摘要
Non-small-cell lung cancer (NSCLC) patients often demonstrate varying clinical courses and outcomes, even within the same tumor stage. This study explores deep learning applications in medical imaging allowing for the automated quantification of radiographic characteristics and potentially improving patient stratification.
非小细胞肺癌(NSCLC)患者通常表现出不同的临床过程和结果,即使在同一肿瘤分期内也是如此。本研究探讨了深度学习在医学成像中的应用,允许对放射学特征进行自动化定量,并可能改善患者分层。
Method
方法
We performed an integrative analysis on 7 independent datasets across 5 institutions totaling 1,194 NSCLC patients (age median = 68.3 years [range 32.5–93.3], survival median =1.7 years [range 0.0–11.7]). Using external validation in computed tomography (CT) data,we identified prognostic signatures using a 3D convolutional neural network (CNN) forpatients treated with radiotherapy (n = 771, age median = 68.0 years, survival median = 1.3 years . We then employed a transfer learning approachto achieve the same for surgery patients (n = 391, age median = 69.1 years , survival median = 3.1 years [range 0.0–8.8]). We found that the CNN predictions weresignificantly associated with 2-year overall survival from the start of respective treatment forradiotherapy (area under the receiver operating characteristic curve [AUC] = 0.70 [95% CI0.63–0.78], p < 0.001) and surgery (AUC = 0.71 [95% CI 0.60–0.82], p < 0.001) patients.The CNN was also able to significantly stratify patients into low and high mortality riskgroups in both the radiotherapy (p < 0.001) and surgery (p = 0.03) datasets. Additionally, theCNN was found to significantly outperform random forest models built on clinical parameters—including age, sex, and tumor node metastasis stage—as well as demonstrate highrobustness against test–retest (intraclass correlation coefficient = 0.91) and inter-reader(Spearman’s rank-order correlation = 0.88) variations. To gain a better understanding of thecharacteristics captured by the CNN, we identified regions with the most contributiontowards predictions and highlighted the importance of tumor-surrounding tissue in patientstratification. We also present preliminary findings on the biological basis of the capturedphenotypes as being linked to cell cycle and transcriptional processes. Limitations includethe retrospective nature of this study as well as the opaque black box nature of deep learning networks.
我们对来自5个机构的7个独立数据集进行了整合分析,共计包括1,194例非小细胞肺癌(NSCLC)患者(年龄中位数 = 68.3岁 ,生存中位数 = 1.7年 。在计算机断层扫描(CT)数据的外部验证中,我们利用3D卷积神经网络(CNN)识别了接受放疗治疗的患者的预后标志(n = 771,年龄中位数 = 68.0岁 [范围 32.5–93.3],生存中位数 = 1.3年 。然后,我们采用转移学习方法,对接受手术治疗的患者实现了相同的结果(n = 391,年龄中位数 = 69.1岁,生存中位数 = 3.1年。我们发现,CNN预测与放疗(开始治疗后的2年总生存期)和手术患者(AUC = 0.71 [95% CI 0.60–0.82],p < 0.001)的整体生存期显著相关。CNN还能够显著将放疗(p < 0.001)和手术(p = 0.03)数据集中的患者分成低和高死亡风险组。此外,CNN的性能在临床参数(包括年龄、性别和肿瘤淋巴结转移分期)的随机森林模型建立方面表现出显著优势,并且对测试再测(内类相关系数= 0.91)和读者间(Spearman秩相关 = 0.88)变化具有高鲁棒性。为了更好地理解CNN捕捉的特征,我们确定了对预测贡献最大的区域,并强调了肿瘤周围组织在患者分层中的重要性。我们还提出了有关捕获的表型的生物学基础的初步发现,这些发现与细胞周期和转录过程有关。限制包括本研究的回顾性特性以及深度学习网络的不透明黑盒特性。
Results
结果
Tumor characterization using 3D deep learning networks
In assessing the ability of deep learning networks to quantify radiographic characteristics oftumors, we performed an integrative analysis on 7 independent datasets totaling 1,194 patients(Fig 1; S1 Table). We identified and independently validated prognostic signatures using aCNN for patients treated with radiotherapy (n = 771, including 608 with 2-year survival follow-up). We then employed a transfer learning approach to achieve the same for surgerypatients (n = 391, including 368 with 2-year survival follow-up). The architecture of the network (Fig 2) was designed to receive 3D input cubes surrounding the center of the primarytumor—based on clinician-located seed points. The network was trained to predict overall survival likelihood 2 years after the start of the respective treatment.Starting with the radiotherapy patients, the analysis was split into a discovery phase and anindependent test phase (Fig 1; S1 Table).
在评估深度学习网络对肿瘤放射学特征进行量化的能力时,我们对7个独立数据集进行了综合分析,共计1,194名患者(图1;S1表)。我们利用CNN识别并独立验证了接受放疗治疗的患者的预后标志(n = 771,其中包括608名进行了为期2年的随访)。然后,我们采用迁移学习方法为接受手术治疗的患者实现了相同的结果(n = 391,其中包括368名进行了为期2年的随访)。网络的架构(图2)被设计为接收围绕原发肿瘤中心的3D输入立方体——基于临床定位的种子点。网络被训练以预测在各自治疗开始后2年的总体生存概率。从放疗患者开始,分析被分为发现阶段和独立测试阶段(图1;S1表)。
Conclusion
结论
Our results provide evidence that deep learning networks may be used for mortality riskstratification based on standard-of-care CT images from NSCLC patients. This evidencemotivates future research into better deciphering the clinical and biological basis of deeplearning networks as well as validation in prospective data.
我们的研究结果提供了证据,表明深度学习网络可以基于非小细胞肺癌(NSCLC)患者的标准护理CT图像进行死亡风险分层。这一证据促使未来研究更好地解读深度学习网络的临床和生物学基础,并在前瞻性数据中进行验证。
Figure
图
Fig 1. General design of the analytical setup. 3D convolutional neural network is trained end-to-end on the radiotherapy datasetgroup. This is followed by a transfer learning approach, where the same network is fine-tuned on the surgery dataset group. Thetraining, tuning, and testing of these networks are all carried out on independent datasets as illustrated. Four further experiments arecarried out on the networks in order to benchmark their performance against random forest models, assess their stability, identifyregions in images responsible for predictions, and finally, explore their biological basis. Numbers outside parentheses refer to thenumber of patients with survival follow-up per dataset. Numbers within parentheses refer to the number of patients with 2-yearoverall survival follow-up only.
络在放疗数据集组上进行端到端的训练。随后采用迁移学习方法,在手术数据集组上对同一网络进行微调。如图所示,这些网络的训练、调整和测试都在独立的数据集上进行。此外,对网络进行了四项进一步的实验,以便将其性能与随机森林模型进行基准测试,评估其稳定性,识别负责预测的图像区域,并最终探索其生物学基础。括号外的数字指的是每个数据集的随访生存患者人数。括号内的数字是指仅进行了为期2年的总生存随访的患者人数。
Fig 2. Illustration of the convolutional neural network. This network was used to predict overall 2-year survival of patients with non-small-cell lung cancer. Thefinal classifier layer outputs normalized probabilities for both classes (0 = deceased and 1 = alive). Only the weights of the final fully connected layer were finetuned during transfer learning. The final convolutional layer (conv4) was used for activation mapping.
图 2. 卷积神经网络示意图。该网络被用于预测非小细胞肺癌患者的总体2年生存情况。最终的分类器层输出两个类别(0 = 死亡,1 = 存活)的归一化概率。在迁移学习期间,只有最终的全连接层的权重进行了微调。最终的卷积层(conv4)用于激活映射。
Fig 3. Prognostic power (AUC) and Kaplan–Meier (KM) curves of deep learning features for both the radiotherapy and surgical networks. (A) AUC plotfor the radiotherapy test dataset Maastro (n = 211). (B) KM plot for the Maastro dataset (n = 307). Patients who have been previously excluded for lack of 2-yearsurvival follow-up have been reincluded (S1 Table). To ensure an independent evaluation, the median split is calculated on the radiotherapy tuning datasetRadboud (n = 147) and locked for evaluation on the radiotherapy test dataset Maastro. © AUC plot for the surgery test dataset M-SPORE (n = 97). (D) KMplot for the M-SPORE dataset (n = 101). The median split is calculated on the surgery tuning dataset MUMC (n = 90) and locked for evaluation on the surgerytest dataset M-SPORE. AUC or ROC-AUC, area under the receiver operating characteristic curve.
图 3. 深度学习特征的预后能力(AUC)和Kaplan-Meier(KM)曲线,分别针对放疗和手术网络。(A) 放疗测试数据集Maastro(n = 211)的AUC图。(B) Maastro数据集(n = 307)的KM曲线。之前因缺乏2年生存随访而被排除的患者已重新纳入(S1表)。为了确保独立评估,中位数分割是在放疗调整数据集Radboud(n = 147)上计算的,并锁定以在放疗测试数据集Maastro上进行评估。© 手术测试数据集M-SPORE(n = 97)的AUC图。(D) M-SPORE数据集(n = 101)的KM曲线。中位数分割是在手术调整数据集MUMC(n = 90)上计算的,并锁定以在手术测试数据集M-SPORE上进行评估。AUC或ROC-AUC,接收器操作特性曲线下的面积。
Fig 4. Activation mapping. Visual highlights of the most “important” regions within the input image—those with themost contribution to maximizing the outputs of the final prediction layer. The rows represent 4 randomly selectedsamples. From the left, the first column represents the central axial slice of the network input (150 × 150 mm) withtumor annotations. In the second column, a 50 × 50 mm patch is cropped around the tumor. In the third column,activation contours are overlaid, with blue and red showing the lowest and highest contributions (gradients),respectively. Column 4 represents the activation heatmaps for a better visual reference. While the heatmaps are 3D,only the central axial slice is shown. Therefore, the entire color spectrum might not be fully visualized.
图 4. 激活映射。对输入图像中最“重要”的区域进行视觉突出显示——即对最终预测层输出做出最大贡献的区域。每行代表4个随机选择的样本。从左到右,第一列代表网络输入(150 × 150 mm)的中央轴切片,带有肿瘤标注。在第二列中,围绕肿瘤裁剪了一个50 × 50 mm的补丁。在第三列中,叠加了激活轮廓,蓝色和红色分别表示最低和最高的贡献(梯度)。第四列表示激活热图,以便更好地进行视觉参考。虽然热图是3D的,但仅显示了中央轴切片。因此,可能无法完全可视化整个颜色谱。
Fig 5. Global gene set expression patterns—Moffitt dataset. The deep learning network predictions on the surgery training dataset Moffitt were linked to global geneexpression patterns using a pre-ranked gene set enrichment analysis (GSEA). Negative and positive enrichments are shown in red and blue, respectively. The top 10enrichments in each category are highlighted.
图 5. 全局基因集表达模式——Moffitt数据集。对手术训练数据集Moffitt上的深度学习网络预测进行了与全局基因表达模式的相关联,使用预排序的基因集富集分析(GSEA)。负和正富集分别以红色和蓝色显示。每个类别中前10个富集被突出显示。