Title
题目
Multistain deep learning for prediction of prognosis and therapy response in colorectal cancer
多染色深度学习预测结直肠癌的预后和治疗反应
01
文献速递介绍
Although it has long been known that the immune cell composition has a strong prognostic and predictive value in colorectal cancer (CRC), scoring systems such as the immunoscore (IS) or quantifcation of intraepithelial lymphocytes are only slowly being adopted into clinical routine use and have their limitations. To address this we established and evaluated a multistain deep learning model (MSDLM) utilizing artifcial intelligence (AI) to determine the AImmunoscore (AIS) in more than 1,000 patients with CRC. Our model had high prognostic capabilities and outperformed other clinical, molecular and immune cell-based parameters. It could also be used to predict the response to neoadjuvant therapy in patients with rectal cancer. Using an explainable AI approach, we confrmed that the MSDLM’s decisions were based on established cellular patterns of anti-tumor immunity. Hence, the AIS could provide clinicians with a valuable decision-making tool based on the tumor immune mic roenvironment.
尽管早已知道免疫细胞组成在结直肠癌(CRC)中具有强大的预后和预测价值,但如免疫评分(IS)或上皮内淋巴细胞定量等评分系统仅缓慢地被采纳进入临床常规使用,并且这些方法有其局限性。为了解决这个问题,我们建立并评估了一个利用人工智能(AI)的多染色深度学习模型(MSDLM),以确定在1000多名CRC患者中的AI免疫评分(AIS)。我们的模型具有高预测能力,并且在其他临床、分子和免疫细胞基础参数上表现优异。它还可以用来预测直肠癌患者对新辅助治疗的反应。通过使用可解释的AI方法,我们确认MSDLM的决策基于反肿瘤免疫的已建立的细胞模式。因此,AIS可以为临床医生提供一个基于肿瘤免疫微环境的宝贵决策工具。
Results
结果
Clinicopathological features of the cohorts Clinicopathological features and CONSORT-type diagrams for the cohorts used in our study are given in Figs. 1 and 2 as well as in the Methods section. For prognostic prediction, a total of 312,771 image tiles were generated from 991 patients to train, validate and test the
我们研究中使用的队列的临床病理特征和CONSORT型图表在图 1 和图 2 以及方法部分中给出。为了预后预测,从991名患者中生成了总共312,771个图像切片用于训练、验证和测试。
Fig
图
Fig. 1 | Clinical characteristics and CONSORT diagrams for the prognostic cohorts. CCC-EMN, Comprehensive Cancer Centre Erlangen – Europäische Metropolregion Nürnberg; Mut, mutation; dMMR, MMR deficiency; pMMR, MMR proficiency; TUM, Technical University Munich; wt, wild type.
图 1 | 预后队列的临床特征和CONSORT图表。CCC-EMN,埃尔朗根综合癌症中心 - 欧洲大都会区纽伦堡;Mut,突变;dMMR,错配修复缺陷;pMMR,错配修复熟练;TUM,慕尼黑技术大学;wt,野生型。
Fig. 2 | Clinical characteristics and CONSORT diagrams for the neoadjuvant cohort. yp, post-therapy cancer stage on pathology.
图 2 | 新辅助治疗队列的临床特征和CONSORT图表。yp,病理学上治疗后的癌症分期。
Fig. 3 | Training and cross-validation of the MSDLM. a, Overview of the experimental set-up. Mφ, macrophages. b, Tumor morphology using H&E staining and examples of CD3-positive, CD4-positive, CD8-positive, CD20- positive and CD68-positive immune cell infiltrates at the invasive margin (inv.marg.). These are the main stainings used in this study. Scale bar, 250 µm. c, Distribution of each model’s accuracy, AUPRC, AUROC and F1 score. n = 11 models were trained during 11-fold cross-validation per group. One-way ANOVA with Dunnett’s test was used to correct for multiple testing. n.s., P > 0.05; *P ≤ 0.05, P ≤ 0.01, P ≤ 0.001 and **P ≤ 0.0001 compared with the MSDLM. The 10th, 50th (median) and 90th quantiles, as well as the minimum and maximum, are shown. d, Confusion matrix showing the predictions of the 11 validation splits together. The Fisher’s exact test and chi-squared test were two-sided. e, Precision–recall and receiver operating characteristics curves of the SSDLMs (light blue) and the MSDLM (red). The mean of the 11-fold cross-validation is shown. Some
illustrations were generated with BioRender.com.
been addressed as yet. Saltz et al.29, for example, used deep learning on H&E slides to determine patterns with regard to tumor-infiltrating lymphocytes but had to rely on spatially unresolved sequencing data. Furthermore, the authors only briefly addressed the implications of their AI-based score for the clinical outcome parameters. Reichling et al. used CD3- and CD8-stained tumor slides to make survival predictions
图 3 | MSDLM的训练和交叉验证。a,实验设置概览。Mφ,巨噬细胞。b,使用H&E染色的肿瘤形态学以及侵袭边缘(inv. marg.)的CD3阳性、CD4阳性、CD8阳性、CD20阳性和CD68阳性免疫细胞浸润的示例。这些是本研究中使用的主要染色。比例尺,250微米。c,每个模型的准确性、AUPRC、AUROC和F1分数的分布。n = 11模型在每组进行11折交叉验证时训练。使用单向ANOVA与Dunnett’s测试校正多重检验。n.s.,P > 0.05;P* ≤ 0.05,*P* ≤ 0.01,P* ≤ 0.001 和 *P* ≤ 0.0001 与MSDLM相比。显示了第10、50(中位数)和第90分位数,以及最小值和最大值。
d,混淆矩阵显示了11个验证分割的预测结果合并在一起。Fisher精确检验和卡方检验是双边的。e,SSDLMs(浅蓝色)和MSDLM(红色)的精确度-召回率和接收器操作特征曲线。显示了11折交叉验证的平均值。一些插图是用BioRender.com生成的。
例如,Saltz等人使用深度学习在H&E切片上确定关于肿瘤浸润淋巴细胞的模式,但必须依赖于空间未解析的测序数据。此外,作者只是简要地讨论了他们基于AI的评分对临床结果参数的影响。Reichling等人使用CD3和CD8染色的肿瘤切片进行生存预测。
Fig. 4 | Determination and performance of the AIS. a, Determination of the AIS based on the MSDLM experiments using a combination of the 11 models trained during cross-validation. When there was a unanimous decision of ‘relapse’, the AIS was defined as low, when there was a split decision, the AIS was defined as intermediate, and when the decision was unanimous for ‘no relapse’, the AIS was defined as high. b, Confusion matrix of the RFSS by AIS. The chi-squared test was two-sided. c–f: Kaplan–Meier plots of IS 2 ©, IS 3 (d), IS Best (e) and AIS (f).n = 339 patients of the Mainz cohort. Censors are indicated with a ‘+’. The log rank test was used. g, Univariate Cox regression. h, Multivariate Cox regression. The Wald test was used to calculate statistical significance. HR, hazard ratio. i,Kaplan–Meier plot showing survival for the AIS subgroups stratified by T stage. n = 141 patients of the Mainz cohort. Censors are indicated with a ‘+’. The log-rank test was used. j, Contribution to the survival model of each prognostic factor. Some illustrations were generated with BioRender.com.
图 4 | AIS的确定和性能。a,基于交叉验证期间训练的11个模型的组合使用MSDLM实验确定AIS。当一致决定为“复发”时,AIS被定义为低;当决定分歧时,AIS被定义为中等;当一致决定为“无复发”时,AIS被定义为高。b,根据AIS的RFSS的混淆矩阵。卡方检验是双边的。c–f:IS 2(c)、IS 3(d)、IS Best(e)和AIS(f)的Kaplan-Meier曲线。
n = 美因茨队列的339名患者。检查点用‘+’表示。使用了对数秩检验。g,单变量Cox回归。h,多变量Cox回归。使用Wald检验计算统计显著性。HR,风险比。i,显示按T分期分层的AIS亚组生存的Kaplan-Meier曲线。n = 美因茨队列的141名患者。检查点用‘+’表示。使用了对数秩检验。j,每个预后因素对生存模型的贡献。一些插图是用BioRender.com生成的。
Fig. 5 | Assessment of the MSDLM using xAI. a,b, Relative contribution of each staining to the correct prediction of ’no relapse’ in the validation samples(n = 102 images per staining) (a) and the test samples (n = 818 images perstaining) (b). The 10th, 50th (median) and 90th quantiles as well as the minimum and maximum are shown.
c,d, Relative contribution of each staining to the correct prediction of ‘relapse’ in the validation samples (n = 128 images per staining) © and the test samples (n = 391 images per staining) (d). The 10th, 50th (median) and 90th quantiles as well as the minimum and maximum are shown. e, Example of a guided Grad-CAM markup image and the original input tile for the CD8 staining. Clear highlighting of CD8-positive cytotoxic T cells can be seen (blowup). The model was examined for the output ‘no relapse’. Scale bars, 100 µm. f, Example of a guided Grad-CAM markup image and the original input tile for the CD68 staining. Tumor-infiltrating macrophages can be observed (blowup). The model was examined for the output ‘relapse’. Scale bars, 100 µm. g, Example of a guided Grad-CAM markup image and the original input tile for the CD68 staining. Non-immune cell morphology can also be highlighted to be associated with the output variable, such as adipocytes (arrows) next to the tumor cells. The model was examined for the output ‘relapse’. Scale bars, 50 µm. h, Step-up scheme to evaluate the importance of SSDLM from different immune cell subtypes (left). Tile-level accuracy as trained and validated on the data from the Mainz cohort. n = 11 models were trained during 11-fold cross-validation per group. The 10th, 50th (median) and the 90th quantiles as well as the minimum and maximum are shown. Crosses indicate the mean (right). i, Two combinations with either a CD4 or a CD8 SSDLM compared with an MSDLM. Tile-level accuracy as trained and validated (11-fold cross-validation) on the data from the Mainz cohort. The 10th, 50th (median) and the 90th quantiles as well as the minimum and maximum are shown. Crosses indicate the mean. AU, arbitrary unit.
图 5 | 使用xAI评估MSDLM。a,b,在验证样本中每种染色对“无复发”正确预测的相对贡献(n = 每种染色102幅图像)(a)和测试样本中(n = 每种染色818幅图像)(b)。显示了第10、50(中位数)和第90分位数以及最小值和最大值。c,d,在验证样本中每种染色对“复发”正确预测的相对贡献(n = 每种染色128幅图像)(c)和测试样本中(n = 每种染色391幅图像)(d)。显示了第10、50(中位数)和第90分位数以及最小值和最大值。
e,CD8染色的原始输入图块和引导型Grad-CAM标记图像的示例。可以清晰地看到CD8阳性细胞毒性T细胞的突出显示(放大)。模型针对的输出是“无复发”。比例尺,100微米。f,CD68染色的原始输入图块和引导型Grad-CAM标记图像的示例。可以观察到肿瘤浸润的巨噬细胞(放大)。模型针对的输出是“复发”。比例尺,100微米。
g,CD68染色的原始输入图块和引导型Grad-CAM标记图像的示例。非免疫细胞形态也可以被突出显示与输出变量相关,如肿瘤细胞旁的脂肪细胞(箭头)。模型针对的输出是“复发”。比例尺,50微米。
h,评估不同免疫细胞亚型的SSDLM重要性的逐步方案(左)。在美因茨队列的数据上训练和验证的图块级别准确性。n = 11模型在每组进行11折交叉验证时训练。显示了第10、50(中位数)和第90分位数以及最小值和最大值。十字表示平均值(右)。
i,与MSDLM相比,两种组合,要么是CD4要么是CD8的SSDLM。在美因茨队列的数据上训练和验证(11折交叉验证)的图块级别准确性。显示了第10、50(中位数)和第90分位数以及最小值和最大值。十字表示平均值。AU,任意单位。
Fig. 6 | Predictive performance of the MSDLM in rectal cancer. a, Set-up of the neoadjuvant study. b, Confusion matrix showing the predictions of the 11 validation splits together. The Fisher’s exact test and chi-squared test were two-sided. c–f, Accuracy ©, AUPRC (d), AUROC (e) and F1 score (f) of the mean SSDLM and the MSDLM. n = 11 models were trained during 11-fold cross-validation per group. n.s., P > 0.05; P* ≤ 0.05, *P* ≤ 0.01, P* ≤ 0.001 and *P* ≤ 0.0001 compared with the MSDLM (two-sided, unpaired t-test). The 10th, 50th (median) and 90th quantiles as well as the minimum and maximum are shown. Some illustrations were generated with BioRender.com.
图 6 | MSDLM在直肠癌中的预测性能。a,新辅助研究的设置。b,混淆矩阵显示了11个验证分割的预测结果合并在一起。Fisher精确检验和卡方检验是双边的。c–f,平均SSDLM和MSDLM的准确性(c)、AUPRC(d)、AUROC(e)和F1分数(f)。n = 11模型在每组进行11折交叉验证时训练。n.s.,P > 0.05;P* ≤ 0.05,*P* ≤ 0.01,P* ≤ 0.001 和 *P* ≤ 0.0001 与MSDLM相比(双边,非配对*t*-检验)。显示了第10、50(中位数)和第90分位数以及最小值和最大值。一些插图是用BioRender.com生成的。