Title
题目
Unlocking the diagnostic potential of electrocardiograms throughinformation transfer from cardiac magnetic resonance imaging
通过从心脏磁共振成像中进行信息传递来释放心电图的诊断潜力
01
文献速递介绍
标准的12导联心电图(ECG)因其使用方便、采集时间短且具有成本效益,成为一种广泛应用的临床检查方式。它是心脏病专家用于记录心脏电活动的一种非侵入性方法。在临床常规检查中,心电图是一种有价值的诊断辅助工具,因为它能够揭示与心脏疾病相关的特征性信号,这些疾病包括心房颤动和左心室收缩功能障碍(西昂蒂斯等人,2021年)。然而,要从心电图对心脏形态状况进行详细评估,需要将心脏的电活动反向建模到实际的生理源头,而这是一个众所周知的不适定问题(古拉贾尼,1998年;申诺内等人,2016年)。因此,对心脏疾病的特征描述和空间定位仅限于那些经过充分研究的病症,比如心肌梗死,即使使用简单的统计分析也能对其进行定位(恩格伦等人,1999年;熊等人,2021年)。 相比之下,心脏磁共振(CMR)成像是一种能够提供高分辨率、容积图像的检查方式,可用于精确评估心脏形态。它能够准确预测重要的心脏表型,例如左心室射血分数(LVEF)或左心室容积,包括左心室舒张末期容积(LVEDV)、左心室收缩末期容积(LVESV)和每搏输出量(LVSV),这些指标可用于定量评估心力衰竭(庞博等人,1971年;阿文迪等人,2016年;达菲等人,2022年)。因此,心脏磁共振成像成为对各种心脏疾病进行基于证据诊断的金标准(李等人,2018年),能够对诸如冠状动脉疾病(CAD)(雷诺兹等人,2021年;王等人,2024年)、心房颤动(AF)(奥克斯等人,2009年;贝特尔森等人,2020年)或糖尿病(DM)(协作组等人,2010年;索伦森等人,2020年;吴等人,2021年)等病症进行详细评估。然而,由于扫描时间长、相关成本高以及需要专业培训的操作人员(冯·克诺贝尔索夫-布伦肯霍夫等人,2017年),其在临床实践中的应用受到了限制。 在这项研究中,我们提出了一种深度学习策略,该策略将心电图的易获取性与心脏磁共振成像的信息价值相结合,从而为临床常规检查提供一种具有成本效益且全面的心脏筛查工具。我们引入了一种新的对比学习策略,该策略以自监督的方式处理大量未标记的多模态生物银行数据。首先,我们在成对的心电图和心脏磁共振成像数据上对单模态编码器进行预训练,以便在不同模态之间传递互补信息。然后,经过训练的信号编码器仅使用心电图数据来预测各种心脏状况和表型。我们的方法的图示见图1。通过将信息从心脏磁共振成像转移到心电图,我们在推理过程中减少了对昂贵的心脏磁共振扫描的依赖,并挖掘出了心电图在为心血管疾病患者提供经济实惠的医疗服务方面的诊断潜力。为此,我们的主要贡献如下: 1. 我们提出了一种新颖的多模态预训练范式,该范式利用了12导联心电图和心脏磁共振成像数据。我们将多模态对比学习与掩码数据建模相结合,以便将信息从心脏磁共振成像转移到心电图。 2. 在使用来自英国生物银行(苏德洛等人,2015年)的40,044名受试者的数据进行的大量基准测试实验中,我们表明我们所学习到的心电图表征能够在各种下游应用中通用。经过微调后,这些表征可用于预测各种心血管疾病的风险,并确定不同的心脏表型。我们的方法优于当前最先进的自监督和监督基线方法,这些基线方法使用心电图、心脏磁共振成像或表格数据进行预测。 3. 通过大量的消融实验,我们定量地证明了掩码数据建模在信息转移过程中的重要性。此外,我们通过潜在向量相似性度量定性地展示了从心脏磁共振成像到心电图的信息转移。
Abatract
摘要
Cardiovascular diseases (CVD) can be diagnosed using various diagnostic modalities. The electrocardiogram(ECG) is a cost-effective and widely available diagnostic aid that provides functional information of theheart. However, its ability to classify and spatially localise CVD is limited. In contrast, cardiac magneticresonance (CMR) imaging provides detailed structural information of the heart and thus enables evidencebased diagnosis of CVD, but long scan times and high costs limit its use in clinical routine. In this work, wepresent a deep learning strategy for cost-effective and comprehensive cardiac screening solely from ECG. Ourapproach combines multimodal contrastive learning with masked data modelling to transfer domain-specificinformation from CMR imaging to ECG representations. In extensive experiments using data from 40,044UK Biobank subjects, we demonstrate the utility and generalisability of our method for subject-specific riskprediction of CVD and the prediction of cardiac phenotypes using only ECG data. Specifically, our novelmultimodal pre-training paradigm improves performance by up to 12.19 % for risk prediction and 27.59 %for phenotype prediction. In a qualitative analysis, we demonstrate that our learned ECG representationsincorporate information from CMR image regions of interest.
心血管疾病(CVD)可以通过多种诊断方式来进行诊断。心电图(ECG)是一种具有成本效益且广泛应用的诊断辅助工具,它能提供心脏的功能信息。然而,心电图在对心血管疾病进行分类和空间定位方面的能力是有限的。相比之下,心脏磁共振(CMR)成像能够提供心脏详细的结构信息,因此可以为心血管疾病的诊断提供基于证据的依据,但较长的扫描时间和高昂的成本限制了它在临床常规检查中的应用。 在这项研究中,我们提出了一种深度学习策略,仅通过心电图就能进行具有成本效益且全面的心脏筛查。我们的方法将多模态对比学习与掩码数据建模相结合,以便将特定领域的信息从心脏磁共振成像转移到心电图的表征中。 在使用来自40,044名英国生物银行受试者的数据进行的大量实验中,我们证明了我们的方法在仅使用心电图数据对个体特定的心血管疾病风险预测以及心脏表型预测方面的实用性和通用性。具体而言,我们新颖的多模态预训练范式在风险预测方面的性能提升了高达12.19%,在表型预测方面提升了27.59%。 在定性分析中,我们证明了我们所学习到的心电图表征融入了来自心脏磁共振成像感兴趣区域的信息。
Method
方法
In this work, we present a novel multimodal pre-training paradigmthat incorporates time series and imaging data to train unimodal encoders (see Fig. 1). We use masked data modeling (MDM) to traina signal encoder unimodally on large amounts of unlabelled 12-leadECG, allowing for learning meaningful ECG representations as described in Section 3.1. After pre-training solely on ECG data, we introduce multi-modal contrastive learning (MMCL) to further pre-trainour unimodal signal encoder with information from CMR imaging asdescribed in Section 3.2. We introduce an interpretability module inSection 3.3 to visually assess the information transfer from CMR imaging to ECG. After these pre-training steps, the unimodal signal encodercan be fine-tuned on a limited amount of labelled data to predict thesubject-specific risk of cardiovascular diseases and to determine cardiacimaging phenotypes during inference solely from ECG.
在这项研究中,我们提出了一种新颖的多模态预训练范式,该范式整合了时间序列数据和图像数据,用于训练单模态编码器(见图1)。 我们使用掩码数据建模(MDM)的方法,在大量未标记的12导联心电图数据上以单模态方式训练一个信号编码器,从而能够如3.1节所述学习到有意义的心电图表征。 在仅基于心电图数据完成预训练后,我们引入了多模态对比学习(MMCL),如3.2节所述,利用来自心脏磁共振成像(CMR)的数据信息对我们的单模态信号编码器作进一步的预训练。 在3.3节中,我们引入了一个可解释性模块,用于从视觉上评估从心脏磁共振成像到心电图的信息转移情况。 在完成这些预训练步骤后,单模态信号编码器可以在有限数量的标记数据上进行微调,以便在推理过程中仅依据心电图数据来预测个体特定的心血管疾病风险,并确定心脏成像表型。
Conclusion
结论
In this work, we present a novel contrastive learning approach thatcan pre-train multimodally using 12-lead ECG and CMR imaging data.During inference, our solution requires only ECG data to unimodallyassess subject-specific risks of cardiovascular diseases and to determine distinct cardiac phenotypes. Specifically, we combine multimodalcontrastive learning with masked data modelling to transfer information from CMR imaging to ECG. We demonstrate both quantitativelyand qualitatively that an information transfer from CMR imaging unlocks the full diagnostic potential of ECG, enabling affordable care forpatients with cardiovascular conditions.In the end, we also acknowledge limitations of our work. Whileour approach can predict unimodally solely from ECG data, it relies onpaired ECG and CMR imaging data during pre-training, which are rareoutside of large biobanks. A further shortcoming of this study is thatmainly healthy subjects are included, since the investigated diseasesare low frequency in the general population. The UK Biobank study alsomainly considers white subjects, with other ethnicities making up only5% of the total cohort. The behaviour of such frameworks with morebalanced data should be investigated in future work. Furthermore, ourresults indicate that the addition of local alignment between ECG timepoints and CMR imaging frames may be worth investigating in futurework.Overall, this study has demonstrated that multimodal contrastivelearning in combination with mask data modelling generates generalpurpose ECG representations that can be applied to a broad rangeof clinical downstream applications. Thus, we present a simple yeteffective strategy that combines the accessibility of ECG with theinformative value of CMR imaging, enabling holistic cardiac screeningsolely by ECG.
在这项研究中,我们提出了一种新颖的对比学习方法,该方法能够利用12导联心电图(ECG)和心脏磁共振成像(CMR)数据进行多模态预训练。在推理阶段,我们的解决方案仅需心电图数据,就能够以单模态的方式评估个体特定的心血管疾病风险,并确定不同的心脏表型。具体而言,我们将多模态对比学习与掩码数据建模相结合,以便将信息从心脏磁共振成像转移到心电图。我们通过定量和定性分析证明,来自心脏磁共振成像的信息转移挖掘出了心电图的全部诊断潜力,为心血管疾病患者提供了经济实惠的医疗服务。 最后,我们也承认我们工作存在的局限性。虽然我们的方法可以仅从心电图数据以单模态方式进行预测,但在预训练阶段,它依赖于成对的心电图和心脏磁共振成像数据,而在大型生物银行之外,这类数据是很稀缺的。这项研究的另一个不足之处在于,纳入的主要是健康受试者,因为所研究的疾病在普通人群中的发病率较低。英国生物银行的研究也主要以白人受试者为对象,其他种族的受试者仅占总队列的5%。在未来的研究中,应该对使用更加均衡数据的此类框架的性能表现进行研究。此外,我们的研究结果表明,在未来的工作中,研究心电图时间点与心脏磁共振成像帧之间添加局部对齐的情况或许是有价值的。 总体而言,这项研究表明,多模态对比学习与掩码数据建模相结合,能够生成通用的心电图表征,这些表征可应用于广泛的临床下游应用。因此,我们提出了一种简单而有效的策略,该策略将心电图的易获取性与心脏磁共振成像的信息价值相结合,使得仅通过心电图就能实现全面的心脏筛查。
Results
结果
5.1. Multimodal pre-training allows for CMR-level disease prediction solelyfrom ECGTo evaluate the utility of our multimodal solution in a clinicalsetting, we compare it to baseline methods that use different modalities,including ECG, CMR imaging, and tabular data, for the subject-specificrisk prediction of cardiovascular diseases. Each baseline is trained fullysupervised on data of the respective modality and subsequently testedwith data of the same modality. Table 1 presents the performance ofall models in predicting the risk of CAD, AF, and DM. Analysing theresults of the tabular baseline, we can see that cardiovascular diseasesare highly correlated with the investigated demographic and physiological factors, namely age, sex, height, and weight. Furthermore, theexperiments show that CMR imaging data is beneficial for predictingthe risk of all diseases except CAD, that can be diagnosed as accuratelywith tabular data. Our proposed approach, that pre-trains multimodallybut predicts the risk of diseases unimodally from ECG, consistentlyoutperforms all the baselines, including the CMR imaging model.These results indicate that leveraging large amounts of multimodalbiobank data during pre-training eliminates the dependency on expensive and limited CMR imaging data during inference in clinical practice. In fact, our approach enables a subject-specific, holistic cardiacscreening solely from cost-effective and widely available ECG.
5.1. 多模态预训练使得仅通过心电图就能实现达到心脏磁共振成像(CMR)水平的疾病预测 为了评估我们的多模态解决方案在临床环境中的实用性,我们将其与使用不同模态(包括心电图(ECG)、心脏磁共振成像(CMR)和表格数据)来预测个体特定心血管疾病风险的基线方法进行了比较。每个基线模型都在各自模态的数据上进行完全监督式训练,随后使用相同模态的数据进行测试。表1展示了所有模型在预测冠状动脉疾病(CAD)、心房颤动(AF)和糖尿病(DM)风险方面的性能。 分析表格数据基线模型的结果,我们可以看到心血管疾病与所研究的人口统计学和生理因素(即年龄、性别、身高和体重)高度相关。此外,实验表明,心脏磁共振成像数据对于预测除冠状动脉疾病之外的所有疾病的风险是有益的,而冠状动脉疾病可以通过表格数据进行同样准确的诊断。 我们提出的方法在多模态下进行预训练,但仅通过心电图以单模态方式预测疾病风险,其性能始终优于所有基线模型,包括心脏磁共振成像模型。 这些结果表明,在预训练过程中利用大量的多模态生物银行数据,消除了在临床实践推理过程中对昂贵且有限的心脏磁共振成像数据的依赖。实际上,我们的方法仅通过具有成本效益且广泛可用的心电图,就能实现针对个体的全面心脏筛查。
Figure
图

Fig. 1. Overview of our training stages and inference process. (a) Our proposed approach uses masked data modelling to learn meaningful ECG representations, eliminatingredundancy inherent to standard 12-lead ECG. (b) We introduce multimodal contrastive learning to transfer domain-specific information from CMR imaging to ECG. © Oncepre-trained, the signal encoder is fine-tuned and can be used during inference to predict the risk of cardiovascular diseases and to predict cardiac phenotypes solely fromECG
图1:我们的训练阶段和推理过程概述。(a)我们提出的方法使用掩码数据建模来学习有意义的心电图表征,消除了标准12导联心电图中固有的冗余信息。(b)我们引入了多模态对比学习,以将特定领域的信息从心脏磁共振成像(CMR)转移到心电图(ECG)。(c)一旦完成预训练,信号编码器将进行微调,并且可以在推理过程中仅使用心电图数据来预测心血管疾病的风险以及预测心脏表型。

Fig. 2. We use masked data modelling to eliminate redundancy inherent to 12-lead ECG, thus generating meaningful ECG representations. To this end, we split the ECG data intopatches of predefined size, out of which a random set is masked out. Note that for visualisation purposes the patch size is set to cover a single heartbeat, however, the actualpatch size may vary. Only the small subset of visible patches is encoded by the signal encoder. The full set of encoded and masked patches is reconstructed by the decoder.
图2:我们利用掩码数据建模来消除12导联心电图中固有的冗余信息,从而生成有意义的心电图表征。为此,我们将心电图数据分割成预定义大小的图块,然后从中随机选取一组图块进行掩码处理。请注意,为了便于可视化,图块大小设置为覆盖单个心跳周期,但实际的图块大小可能有所不同。只有一小部分可见的图块会由信号编码器进行编码。解码器会对所有已编码和被掩码处理的图块进行重构。

Fig. 3. We introduce multimodal contrastive learning that combines 12-lead ECG and CMR imaging, enabling self-supervised information transfer from CMR imaging to ECG.ECG and CMR images are embedded separately using unimodal encoders. The representations are projected separately onto a shared latent space, where information is exchangedbetween both modalities. A passive interpretability module visualises the similarity between the global ECG representation and local CMR image representations, allowing for aqualitative evaluation of the information transfer.
图3:我们引入了将12导联心电图(ECG)与心脏磁共振成像(CMR)相结合的多模态对比学习方法,实现了从心脏磁共振成像到心电图的自监督信息传递。心电图和心脏磁共振图像分别通过单模态编码器进行嵌入处理。这些表征被分别投影到一个共享的潜在空间中,在这个空间中两种模态之间的信息得以交换。一个被动可解释性模块将整体的心电图表征与局部的心脏磁共振图像表征之间的相似性进行可视化呈现,从而能够对信息传递情况进行定性评估。

Fig. 4. We use our multimodally pre-trained model to predict 61 cardiac imaging phenotypes solely from ECG data. The graphs show 20 imaging phenotypes of 500 subjects, aswell as the linear regression line for the whole test set population. Pearson’s correlation coefficient ® is reported.
图4:我们运用多模态预训练模型,仅依据心电图(ECG)数据来预测61种心脏成像表型。这些图表展示了500名受试者的20种成像表型,以及整个测试集人群的线性回归线。同时报告了皮尔逊相关系数(r)。

Fig. 5. Performance of our multimodal approach with different number of fine-tune training samples. We compare our solution to supervised and self-supervised baseline models.Shaded regions indicate 95% confidence intervals. Multimodal contrastive learning with masked data modelling generally outperforms all other models at all data quantities andis well suited for predicting risks of rare diseases
图5:我们的多模态方法在不同数量的微调训练样本下的性能表现。我们将我们的解决方案与有监督和自监督的基线模型进行了比较。阴影区域表示95%的置信区间。结合了掩码数据建模的多模态对比学习方法,在所有数据量的情况下,总体上都优于所有其他模型,并且非常适合预测罕见疾病的风险。

Fig. 6. Qualitative evaluation of the information transfer from CMR imaging to ECG using our patch-based interpretability module. Cosine similarities between the ECG representationand representations of local CMR image regions produce a similarity grid that is superimposed onto the CMR image as heatmap. Our approach elicits high similarity betweethe ECG and CMR image regions of interest and achieves robust information transfer from CMR images even in the cases of occlusion, motion, and samples showing extrememorphological properties
图6:利用我们基于图块的可解释性模块,对从心脏磁共振成像(CMR)到心电图(ECG)的信息传递进行定性评估。心电图的表征与心脏磁共振图像局部区域的表征之间的余弦相似度生成了一个相似度网格,该网格作为热图叠加在心脏磁共振图像上。我们的方法在心电图与感兴趣的心脏磁共振图像区域之间引发了高度的相似性,并且即使在存在遮挡、运动以及样本呈现出极端形态特征的情况下,也能实现从心脏磁共振图像到心电图的稳健信息传递。
Table
表

Table 1Comparison of different diagnostic modalities for the risk prediction of cardiovasculardiseases, i.e., coronary artery disease (CAD), atrial fibrillation (AF), and diabetis
表1 不同诊断方式对心血管疾病(即冠状动脉疾病(CAD)、心房颤动(AF)和糖尿病)风险预测的比较

Table 2Comparison of our approach against supervised and state-of-the-art self-supervisedbaseline models. Best scores (ROC AUC [%]) are in bold font, second best underlined.Our multimodally pre-trained model outperforms all baseline models on every task.
表2 我们的方法与有监督模型以及最先进的自监督基线模型的对比情况。最佳分数(受试者工作特征曲线下面积(ROC AUC)[%])以粗体显示,第二佳的分数以下划线标注。我们的多模态预训练模型在各项任务上的表现均优于所有基线模型。

Table 3Ablation study on pre-training strategies for the risk prediction of cardiovascular diseases from ECG. Best scores (ROC AUC [%]) are in bold font, second best underlined. Multimodalcontrastive learning with masked data modelling consistently outperforms all other methods
表3 关于从心电图预测心血管疾病风险的预训练策略的消融研究。最佳分数(受试者工作特征曲线下面积(ROC AUC)[%])以粗体显示,第二佳的分数以下划线标注。结合了掩码数据建模的多模态对比学习方法始终优于所有其他方法。

Table 4Ablation study on established image encoders for visual feature extraction during multimodal pre-training. Best scores (ROC AUC [%]) are in bold font. Convolutional networksare most effective in extracting general CMR imaging features that transfer well to ECG representations.
表4 针对在多模态预训练期间用于视觉特征提取的已有的图像编码器的消融研究。最佳分数(受试者工作特征曲线下面积(ROC AUC)[%])以粗体显示。卷积网络在提取通用的心脏磁共振成像(CMR)特征方面最为有效,这些特征能够很好地迁移到心电图(ECG)表征中。

Table 5Ablation study on pre-training strategies for predicting cardiac imaging phenotypes solely from ECG data. Pearson’s correlation coefficient (𝑟) and the coefficient of determination(𝑅2 ) are reported as mean across all phenotypes. Best scores are in bold font, second best underlined. Multimodal contrastive learning with masked data modelling consistentlyoutperforms all other methods. The CMR imaging model is provided as reference
表5 关于仅通过心电图(ECG)数据预测心脏成像表型的预训练策略的消融研究。皮尔逊相关系数(𝑟)和决定系数(𝑅²)以所有表型的平均值进行报告。最佳分数以粗体显示,第二佳的分数以下划线标注。结合了掩码数据建模的多模态对比学习方法始终优于所有其他方法。心脏磁共振成像(CMR)模型作为参考给出。

Table 6Comparison of our approach against the state-of-the-art multimodal baseline. Best scores (𝑅2 ) are in bold font, second best underlined. The Cross-Modal AE results are sourcedfrom the original paper. Our multimodally pre-trained model outperforms the baseline model on every task.
表6:我们的方法与最先进的多模态基线方法的对比。最佳分数(𝑅²)以粗体显示,第二佳的分数以下划线标注。跨模态自动编码器(Cross-Modal AE)的结果源自原始论文。我们的多模态预训练模型在各项任务上的表现均优于基线模型。

Table 7Ablation study on pre-training strategies for the risk prediction of cardiovascular diseases from CMR images. Best scores (ROC AUC [%]) are in bold font, second best underlined.Unimodal CMR image analysis benefits from multimodal pre-training
表7 关于从心脏磁共振(CMR)图像预测心血管疾病风险的预训练策略的消融研究。最佳分数(受试者工作特征曲线下面积(ROC AUC)[%])以粗体显示,第二佳的分数以下划线标注。单模态的心脏磁共振图像分析得益于多模态预训练。