Title
题目
Prostate cancer therapy personalization via multi-modal deep learning on randomized phase III clinical trials
前列腺癌治疗个性化的多模式深度学习在随机三期临床试验中的应用
01
文献速递介绍
在2020年,全球发生了1,414,259例新病例和375,304例前列腺癌死亡病例。虽然前列腺癌通常是缓慢发展的,治疗可能是治愈性的,但由于过度和不足治疗的负面影响,前列腺癌成为导致癌症相关残疾的主要全球原因,并且是男性癌症死亡的主要原因。确定个体患者的最佳治疗方案是困难的,需要考虑他们的整体健康状况,癌症的特征,许多可能治疗的副作用概况,涉及类似诊断患者群的临床试验的结果数据,以及对其预期未来结果的预测。这一挑战由于缺乏可轻易获取的预后工具来更好地对患者进行风险分层而变得更加复杂。
全球用于对患者进行风险分层的最常见系统之一是国家综合癌症网络(NCCN)或于1990年代末开发的D’Amico风险组。该系统基于前列腺的数字直肠检查、血清前列腺特异性抗原(PSA)水平和通过组织病理学评估的肿瘤活检等。这种三级系统构成了世界各地局部前列腺癌治疗建议的基础,但已反复证明其预后和鉴别性能力不佳。这在一定程度上是由于这些模型中核心变量的主观和非特异性特性。例如,格里森分级是在20世纪60年代开发的,即使在专家泌尿外科病理学家中,其间观者间的再现性也不佳。尽管已经创建了更新的临床病理风险分层系统,但其中仍然有三个核心变量—格里森分级、T分期和PSA。
Method
方法
Dataset preparation
In collaboration with NRG Oncology, we obtained access to full patientlevel baseline clinical data, digitized histopathology slides of pretreatmentand posttreatment prostate tissue, and longitudinal outcomes from fivelandmark, large-scale, prospective, randomized, multinational clinical trialscontaining 5654 patients, 16,204 histopathology slides, and >10 years ofmedian follow-up: NRG/RTOG-9202, 9408, 9413, 9910, and 0126 (Table 1).Patients in these trials were randomized across various combinations ofexternal radiotherapy (RT) with or without different durations of androgendeprivation therapy (ADT). The slides were digitized over a period of 1 yearby NRG Oncology using a Leica Biosystems Aperio AT2 digital pathologyscanner at a resolution of 20x. The histopathology images were manuallyreviewed for quality and clarity. Six baseline clinical variables that werecollected across all trials (combined Gleason score, Gleason primary,Gleason secondary, T-stage, baseline PSA, age), along with the digitalhistopathology images, were used for model training and validation. Thepatients from five trials were split into training (80%) and validation (20%)datasets, and there was no patient overlap among splits. To ensure thatthe test set captured a clinically relevant and representative subset ofpatients, the final test set was selected such that the NCCN risk group’s5-year distant metastasis AUC performance was between 0.7 and 0.75, asobserved in the literature33,34. Institutional Review Board approval wasobtained from NRG Oncology (IRB00000781) and informed consent waswaived because this study was performed with anonymized data.
数据集准备
我们与NRG Oncology合作,获得了完整的患者基线临床数据、术前和术后前列腺组织的数字化组织病理学幻灯片,以及来自五项具有里程碑意义的大规模前瞻性、随机、跨国临床试验的纵向结果,包括5654名患者、16204张组织病理学幻灯片和>10年的中位随访时间:NRG/RTOG-9202、9408、9413、9910和0126(见表1)。
这些试验中的患者被随机分配到不同组合的外部放射治疗(RT)与或不同时长的雄激素剥夺治疗(ADT)中。这些幻灯片由NRG Oncology在1年的时间内使用Leica Biosystems Aperio AT2数字病理学扫描仪进行数字化,分辨率为20倍。组织病理学图像经过人工审核,确保质量和清晰度。所有试验中收集的六个基线临床变量(合并格里森分级、格里森一级、格里森二级、T分期、基线PSA、年龄),以及数字化的组织病理学图像,用于模型的训练和验证。来自五项试验的患者被分为训练(80%)和验证(20%)数据集,两者之间没有患者重叠。为确保测试集捕获临床相关且代表性的患者子集,最终测试集被选取,使NCCN风险组的5年远处转移AUC性能介于0.7到0.75之间,正如文献中观察到的那样。研究获得了NRG Oncology(IRB00000781)的机构审查委员会批准,并且因为此研究是使用匿名化数据进行的,因此免除了知情同意。
Results
结果
We created a unique MMAI architecture that ingests both tabularclinical and image data, and trains with self-supervised learning toeverage the substantial amount of data available. We trained andvalidated six distinct models on a dataset of 16,204 histopathologyslides (~16 TB of image data) and clinical data from 5,654 patientsto predict six binary outcomes varying by endpoints andtimeframes (5- and 10-year distant metastasis, 5- and 10-yearbiochemical failure, 10-year prostate cancer-specific survival, and10-year overall survival). Notably, accurate prediction of distantmetastasis at 5 and 10 years is particularly important foridentifying patients who may have more aggressive disease andrequire additional treatment. We measured the performance ofthese models with the area under the time-dependent receiveroperator characteristic curve (AUC) of sensitivity and specificity,based on censored events accounting for competing risks, and theNCCN risk groups served as our baseline comparator. Prior tomodel development, data from all five clinical trials were split intotraining (80%) and validation (20%). The MMAI model consistentlyoutperformed the NCCN risk groups across all tested outcomeswhen comparing the performance results for the validation set.
我们创建了一种独特的MMAI架构,可以同时处理表格型临床数据和图像数据,并通过自监督学习来训练,以利用可用的大量数据。我们在一组由16,204张组织病理学幻灯片(约16 TB的图像数据)和来自5,654名患者的临床数据的数据集上训练和验证了六个不同的模型,用于预测六个不同的二元结果,根据终点和时间范围有所不同(5年和10年的远处转移、5年和10年的生化失败、10年前列腺癌特异性生存率和10年总体生存率)。值得注意的是,在5年和10年的远处转移的准确预测对于识别可能患有更侵袭性疾病并需要额外治疗的患者特别重要。我们使用基于受限事件的时间相关受试者工作特征曲线(AUC)来衡量这些模型的性能,该曲线考虑了竞争风险,并将NCCN风险组作为基线比较器。在模型开发之前,来自所有五个临床试验的数据被分为训练集(80%)和验证集(20%)。在验证集上比较性能结果时,MMAI模型在所有测试结果上一直优于NCCN风险组。
Figure
图
Fig. 1 Multimodal deep learning system and dataset. a The multimodal architecture is composed of two parts: a tower stack to parse avariable number of digital histopathology slides and another tower stack to merge the resultant features and predict binary outcomes. b Thetraining of the self-supervised model of the image tower stack.
图1 多模式深度学习系统和数据集。a 多模式架构由两部分组成:一个塔堆用于解析可变数量的数字组织病理学幻灯片,另一个塔堆用于合并结果特征并预测二元结果。b 图像塔堆的自监督模型训练。
Fig. 2 Pathologist interpretation of self-supervised model tissue clusters. The self-supervised model in the multimodal model was trainedto identify whether or not augmented versions of small patches of tissue came from the same original patch, without ever seeing clinical datalabels. After training, each image patch in the dataset of 10.05 M image patches was fed through this model to extract a 128-dimensionalfeature vector, and the UMAP algorithm27 was used to cluster and visualize the resultant vectors. A pathologist was then asked to interpret the20 image patches closest to each of the 25 cluster centroids—the descriptions are shown next to the insets. For clarity, we only highlight 6clusters (colored), and show the remaining clusters in gray. See Supplementary Fig. 2 for full pathologist annotation.
图2 病理学家对自监督模型组织簇的解释。多模态模型中的自监督模型经过训练,以识别增强版本的小组织片段是否来自同一原始片段,而无需查看临床数据标签。训练后,将数据集中的每个图像片段通过该模型,提取一个128维的特征向量,并使用UMAP算法对结果向量进行聚类和可视化。然后,请病理学家解释与25个聚类中心最近的20个图像片段 - 描述显示在插图旁边。为了清晰起见,我们仅突出显示了6个聚类(着色),并将其余聚类显示为灰色。请参阅补充图2以获取完整的病理学家注释。
Fig. 3 Comparison of the multimodal deep learning system to NCCN risk groups across trials and outcomes. a Performance resultsreporting on the area under the curve (AUC) of time-dependent receiver operator characteristics of the MMAI (blue bars) vs. NCCN (gray bars)models, include 95% confidence intervals and two-sided p-values. Comparisons were made across 5-year and 10-year time points on thefollowing binary outcomes: distant metastasis (DM), biochemical failure (BF), prostate cancer-specific survival (PCSS), and overall survival (OS).b Summary table of the relative improvement of the MMAI model over the NCCN model across the various outcomes and broken down byperformance on the data from each trial in the validation set. Relative improvement is given by (AUCMMAI − AUCNCCN)/AUCNCCN. c Ablationstudy showing model performance when trained on a sequentially decreasing set of data inputs, including the pathology images only (path),pathology images + NCCN variables (path + NCCN), and pathology images + NCCN variables + age + Gleason primary + Gleason secondary(path + NCCN + 3). d–h Performance comparison on the individual clinical trial subsets of the validation set—together, these five comprisethe entire validation set shown in (a).
图3多模式深度学习系统与NCCN风险组在试验和结果上的比较。a 绩效结果报告MMAI(蓝色条)与NCCN(灰色条)模型的时间相关受试者工作特征曲线下面积(AUC),包括95%置信区间和双侧p值。在以下二元结果的5年和10年时间点进行比较:远处转移(DM)、生化失败(BF)、前列腺癌特异性生存(PCSS)和总体生存(OS)。b 总结表显示MMAI模型相对于NCCN模型在各种结果上的相对改进,并按照在验证集中每个试验数据上的性能进行分解。相对改进由(AUCMMAI - AUCNCCN)/AUCNCCN给出。c 消融研究显示当在逐步减少的数据输入集上进行训练时,模型的性能,包括仅病理学图像(path)、病理学图像+ NCCN变量(path + NCCN)和病理学图像+ NCCN变量+ 年龄+ 格里森一级+ 格里森二级(path + NCCN + 3)。d-h 在验证集的各个临床试验子集上的性能比较 - 这五个试验组成了(a)中显示的整个验证集。
Table
表
Table 1. Clinicopathologic and trial characteristics.
表1 临床病理学和试验特征。
Table 2. Validation results for the subset of patients from the 20% validation set that includes patients with pretreatment slides only (n = 931).
表2. 针对仅包含术前幻灯片的20%验证集子集(n = 931)的验证结果。