基于概念的病灶感知 Transformer 用于可解释性视网膜疾病诊断|文献速递-最新医学人工智能文献
Title
题目
Concept-Based Lesion Aware Transformer forInterpretable Retinal Disease Diagnosis
基于概念的病灶感知 Transformer 用于可解释性视网膜疾病诊断
01
文献速递介绍
眼底图像在视网膜疾病(如糖尿病视网膜病变(DR)、年龄相关性黄斑变性(AMD)及病理性近视(PM))的诊断中不可或缺。随着深度学习的发展,深度神经网络(DNNs)已被广泛应用于辅助眼科医生实现视网膜疾病的自动诊断——其通过大规模数据集和复杂架构(如卷积神经网络(CNNs)与视觉Transformer(ViTs)),从眼底图像中学习具有区分性的特征。这些模型在各类视网膜疾病诊断任务中展现出了优异的准确率与性能[1]-[8]。然而,这类基于DNN的方法在临床实践中应用较少,主要原因在于深度学习的“黑箱”特性。由于医学诊断需要对领域知识有深入理解,人类专家无法理解的决策是不可接受的,而这种透明度的缺失会削弱医生对模型的信任[9]。为解决这一问题,研究者提出了多种策略[10]-[11]以帮助人们理解深度神经网络的决策过程。目前,医学图像分析模型的解释主要依赖显著性图,这类图像会突出模型认为对预测至关重要的区域[12]-[13]。尽管应用广泛,但部分研究[14]-[16]表明,这些方法生成的显著性图在分类错误的样本上可能存在不一致性。 近年来,基于概念的方法[17]-[18]在可解释人工智能领域备受关注。研究者已开始将这类方法应用于医学图像分析[19]-[22],借助与人类认知对齐的概念来解释模型的决策过程。这不仅提升了可解释性,还能更真实地反映模型的实际决策流程。通常,这类方法会在基于DNN的模型中,于骨干网络与分类头之间插入一个额外的瓶颈层。该层旨在从输入数据中提取概念:当特定概念存在时,瓶颈层的特定过滤器会被激活。随后,分类头利用这些可理解的概念进行最终预测。这种范式确保模型的决策过程可通过这些概念的贡献度,被轻松且真实地解释。 已有多项研究尝试将病灶信息融入模型训练,以提升视网膜疾病诊断性能[2]、[4]、[23]-[24]。受此启发,我们提出将病灶视为“概念”,并采用基于概念的框架开发具有内在可解释性的视网膜疾病诊断模型。如图1(a)所示,视网膜病灶在形态、大小、位置、纹理和颜色上存在显著差异,且单张眼底图像中常稀疏分布多种类型的病灶。鉴于这些复杂性,我们认为:Transformer因其擅长捕捉长距离依赖关系[25],在识别视网膜病灶方面天生比CNNs更具优势。如图1(b)(CNNs)与图1©(Transformer)分别展示的注意力区域对比可视化结果,进一步印证了这一点。基于这一认知,我们提出了一种新型的基于Transformer的视网膜病灶概念学习模型。借助Transformer捕捉长距离依赖关系的优势,我们的方法能更准确地捕捉病灶信息,从而实现更优的诊断性能与可解释性。 本文提出了基于概念的病灶感知Transformer(Concept-based Lesion Aware Transformer,CLAT)——一种用于从眼底图像实现可解释性视网膜疾病诊断的新型框架。CLAT将视网膜病灶视为概念,使模型决策与临床相关的病灶特征对齐。该方法不仅提升了模型透明度,还借助Transformer捕捉稀疏分布视网膜病灶的能力,相比CNNs具备显著优势。为获得更具区分性的表征并明确利用所学的病灶概念,CLAT采用多个可学习令牌(token)来表示每个病灶。相较于传统基于概念的方法(从提取的图像特征中编码概念表征),该方法能更有效地捕捉概念表征。考虑到监督学习通常聚焦于最具区分性的特征,却牺牲了人类认知中固有的语义关联,我们的方法进一步利用视网膜基础模型[26]突破了这一局限:我们整合图像级病灶标注用于初始特征学习,并提出一种新型的知识引导增强策略,使这些特征与从视网膜基础模型中提取的医学领域知识对齐,确保模型对病灶的理解与人类专家的认知一致。此外,我们设计了一种创新性的可解释分类器——通过可学习的疾病令牌与病灶令牌之间的交叉注意力机制,输出疾病分类对数概率(logits)。其中,注意力权重可直接作为各病灶对特定疾病诊断结果的贡献度衡量标准,为模型决策过程提供基于概念的解释。模型训练完成后,CLAT仅需输入眼底图像即可诊断视网膜疾病,并基于视网膜病灶提供解释;若因病灶预测错误导致诊断偏差,可通过干预纠正这些错误。我们在四个数据集上评估了CLAT在视网膜疾病诊断与可解释性方面的性能,包括FGADR数据集[27]、DDR数据集子集[28]、一个私有数据集,以及由FGADR与DDR子集合并而成的FGADDR数据集。本文的贡献总结如下: - 设计了新型的病灶感知Transformer编码器与基于概念的可解释分类器,借助视网膜病灶提升诊断性能、提供基于概念的解释,并支持通过概念层面的干预纠正诊断错误。 - 引入图像级病灶监督与基于视网膜基础模型的知识引导病灶概念增强策略,有效将病灶表征与医学领域知识对齐,进而提升模型的准确性与可靠性。 - 在四个数据集上开展了大量实验,结果表明:我们的策略不仅能与当前最先进的方法相比取得优异性能,还能为模型的决策过程提供可靠的解释。
Aastract
摘要
Existing deep learning methods haveachieved remarkable results in diagnosing retinal diseases,showcasing the potential of advanced AI in ophthalmology.However, the black-box nature of these methodsobscures the decision-making process, compromisingtheir trustworthiness and acceptability. Inspired by theconcept-based approaches and recognizing the intrinsiccorrelation between retinal lesions and diseases, we regardretinal lesions as concepts and propose an inherentlyinterpretable framework designed to enhance both theperformance and explainability of diagnostic models.Leveraging the transformer architecture, known forits proficiency in capturing long-range dependencies, ourmodel can effectively identify lesion features. By integratingwith image-level annotations, it achieves the alignmentof lesion concepts with human cognition under theguidance of a retinal foundation model. Furthermore,to attain interpretability without losing lesion-specificinformation, our method employs a classifier built on across-attention mechanism for disease diagnosis andexplanation, where explanations are grounded in thecontributions of human-understandable lesion conceptsand their visual localization. Notably, due to the structureand inherent interpretability of our model, clinicians canimplement concept-level interventions to correct thediagnostic errors by simply adjusting erroneous lesionpredictions. Experiments conducted on four fundusimage datasets demonstrate that our method achievesfavorable performance against state-of-the-art methodswhile providing faithful explanations and enabling conceptlevel interventions.
现有深度学习方法在视网膜疾病诊断方面已取得显著成果,彰显了先进人工智能在眼科领域的应用潜力。然而,这些方法的“黑箱”特性掩盖了其决策过程,损害了自身的可信度与可接受度。受基于概念的方法启发,并考虑到视网膜病灶与疾病之间的内在关联,我们将视网膜病灶视为“概念”,提出了一个具有内在可解释性的框架,旨在同时提升诊断模型的性能与可解释性。 Transformer架构在捕捉长距离依赖关系方面表现出色,我们的模型借助该架构可有效识别病灶特征。通过与图像级标注相结合,在视网膜基础模型的引导下,实现了病灶概念与人类认知的对齐。此外,为在不丢失病灶特定信息的前提下实现可解释性,我们的方法采用基于交叉注意力机制的分类器进行疾病诊断与解释——其中,解释基于人类可理解的病灶概念及其视觉定位的贡献。值得注意的是,得益于模型的结构设计及其内在可解释性,临床医生可实施概念层面的干预:只需调整错误的病灶预测,即可纠正诊断误差。在四个眼底图像数据集上开展的实验表明,我们的方法与当前最先进的方法相比,不仅取得了优异的性能,还能提供可靠的解释,并支持概念层面的干预。
Method
方法
A. Overview
Fundus imaging is one of the most commonly used toolsfor retinal disease diagnosis. The retinal disease diagnosis taskaims to predict the type or severity of retinal diseases based onfundus images, which is a multi-class classification problem.Given a fundus image x, the standard diagnostic model fis required to output the disease diagnosis yd , which canbe formulated as yd = f (x), with the causal graph x →yd . To improve interpretability, we adopt the concept-basedapproach by introducing lesion concepts yl . This involvesfirst predicting the retinal lesions, which is a multi-labelclassification problem, and then enabling diagnosis based onthese lesion concepts. The causal graph for this concept-basedapproach is x → yl → yd , ensuring consistency betweenlesion concept explanations and disease diagnosis.In the clinic, ophthalmologists typically rely on their clinicalexperience and domain knowledge to identify retinal lesions,which form the basis of their diagnoses. To mimic the diagnostic process of human experts and achieve interpretable retinaldisease diagnosis, we leverage lesion concepts to deliverhigh-accuracy diagnostic performance and faithful decisionexplanations. This enables ophthalmologists to understand thedecision-making process of the model, thereby making themodel trustworthy and reliable in real clinical practice. Fig. 2shows the overall architecture of CLAT. We utilize multiplelesion tokens to represent different lesion concepts and useimage-level annotations for learning discriminative lesion features. To align these learned features with domain knowledge,a retinal foundation model is employed to guide the learningprocess. These modules achieve the x → yl part of the causalgraph. Finally, a novel interpretable classifier, based on a crossattention mechanism, is introduced to enable disease diagnosisbased on lesion concepts while providing faithful explanations,realizing the yl → yd part of the causal graph. Each moduleof CLAT is detailed in the following sections.
A. 概述 眼底成像是视网膜疾病诊断中最常用的工具之一。视网膜疾病诊断任务旨在基于眼底图像预测视网膜疾病的类型或严重程度,属于多分类问题。给定一张眼底图像(x),标准诊断模型(f)需输出疾病诊断结果(y_d),其数学表达式可表示为(y_d = f(x)),对应的因果关系图为(x \to y_d)。为提升可解释性,我们采用基于概念的方法,引入病灶概念(y_l):先完成视网膜病灶预测(属于多标签分类问题),再基于这些病灶概念进行疾病诊断。该基于概念的方法对应的因果关系图为(x \to y_l \to y_d),可确保病灶概念解释与疾病诊断结果的一致性。 在临床场景中,眼科医生通常依靠临床经验与领域知识识别视网膜病灶,并以此为依据进行诊断。为模拟人类专家的诊断流程、实现可解释性视网膜疾病诊断,我们借助病灶概念,在实现高精度诊断性能的同时,提供真实可靠的决策解释。这能帮助眼科医生理解模型的决策过程,进而使模型在实际临床应用中具备可信度与可靠性。图2展示了CLAT的整体架构:我们采用多个病灶令牌(lesion tokens)表示不同的病灶概念,并利用图像级标注学习具有区分性的病灶特征;为使所学特征与领域知识对齐,引入视网膜基础模型指导学习过程——这些模块共同实现了因果关系图中(x \to y_l)的部分。最后,引入一种基于交叉注意力机制的新型可解释分类器,基于病灶概念完成疾病诊断并提供真实解释,实现因果关系图中(y_l \to y_d)的部分。下文将详细介绍CLAT的各个模块。
Conclusion
结论
In this paper, we introduce the Concept-based Lesion AwareTransformer (CLAT), a novel framework for the interpretablediagnosis of retinal diseases from fundus images. The outstanding performance of CLAT is derived from its innovativestructure, utilizing lesion tokens and a cross-attention mechanism to enable a profound understanding of retinal lesions anddiagnosis making based on lesion concepts. This effectivenessis further amplified by the combination of supervised lesiondiscovery and a knowledge guided enhancement strategy,aligning the learned lesion concepts with medical domainknowledge. After training, CLAT only needs fundus imagesto diagnose retinal diseases and generate faithful explanations based on lesion concepts. Furthermore, CLAT allowsfor interaction with clinicians during the diagnostic process,helping them better understand the decision-making processand aligning more closely with clinical practices.
本文提出了基于概念的病灶感知Transformer(Concept-based Lesion Aware Transformer,CLAT)——一种用于从眼底图像实现视网膜疾病可解释性诊断的新型框架。CLAT的优异性能源于其创新性结构:通过病灶令牌(lesion tokens)与交叉注意力机制,深入理解视网膜病灶,并基于病灶概念做出诊断。监督式病灶挖掘与知识引导增强策略的结合,进一步提升了该框架的有效性,使所学的病灶概念与医学领域知识保持对齐。模型训练完成后,CLAT仅需输入眼底图像即可完成视网膜疾病诊断,并基于病灶概念生成可靠的解释。此外,CLAT允许在诊断过程中与临床医生进行交互,帮助医生更好地理解模型决策过程,从而更贴合临床实践需求。
Figure
图

Fig. 1. (a) A sample fundus image with different lesions annotated indifferent colors. (b) Visualization of attention region of CNN. © Visualization of attention region of transformer.
图1 (a)带有不同颜色标注各类病灶的眼底图像样本;(b)卷积神经网络(CNN)注意力区域可视化图;(c)Transformer注意力区域可视化图

Fig. 2. The architecture of our proposed concept-based lesion aware transformer (CLAT). (a) shows the Lesion Aware Transformer Encoder withmultiple learnable lesion tokens to represent different lesion concepts, facilitating the learning of discriminative features and enabling the explicitutilization of lesion information. (b) shows the Supervised Lesion Discovery module, which leverages image-level lesion annotations to superviselesion concept learning and a strategy that aggregates the patch tokens to enhance the local context of the lesion tokens. © shows the KnowledgeGuided Lesion Concept Enhancement module, which leverages the domain knowledge of the retinal foundation model to guide the lesion conceptlearning. (d) shows the Concept-based Interpretable Classifier, which utilizes the rich retinal lesion concept information learned by the model toachieve accurate and explainable disease diagnosis with concept-based explanations.
图2 本文提出的基于概念的病灶感知Transformer(CLAT)架构 (a)展示病灶感知Transformer编码器:通过多个可学习病灶令牌(lesion tokens)表示不同病灶概念,助力学习具有区分性的特征,同时实现病灶信息的明确利用; (b)展示监督式病灶挖掘模块:利用图像级病灶标注监督病灶概念学习,并采用补丁令牌(patch tokens)聚合策略,增强病灶令牌的局部上下文信息; (c)展示知识引导的病灶概念增强模块:借助视网膜基础模型的领域知识,指导病灶概念学习过程; (d)展示基于概念的可解释分类器:利用模型所学的丰富视网膜病灶概念信息,实现准确且可解释的疾病诊断,并提供基于概念的解释依据。

Fig. 3. Visualization comparison with CAM [10] on FGADDR dataset. The left part of the dashed line is retinal disease diagnosis models, and theright part is concept-based models. Since the black-box models cannot generate visualization for specific lesion, we only show the visualization ofpredicted disease. † indicates the transformer-based model. KG: Knowledge Guided Enhancement. SLD: Supervised Lesion Discovery. GT: GroundTruth.
图3 FGADDR数据集上与CAM[10]的可视化对比 虚线左侧为视网膜疾病诊断模型,右侧为基于概念的模型。由于黑箱模型无法针对特定病灶生成可视化结果,因此仅展示预测疾病的可视化图。†表示基于Transformer的模型。KG(知识引导增强)、SLD(监督式病灶挖掘)、GT(真值)。

Fig. 4. Evaluation of the hyperparameters α on the FGADDR dataset.We use Kappa to measure the performance of disease grading, AUC forlesion concept prediction, and OIS [53] for the quality of learned lesiontokens. ↑/↓ indicates that a metric is better if its score is higher/lower
图4 FGADDR数据集上超参数α的评估结果 采用Kappa系数衡量疾病分级性能,采用AUC(曲线下面积)衡量病灶概念预测性能,采用OIS[53]衡量所学病灶令牌的质量。↑/↓表示某指标得分越高/越低,该指标对应的性能越优。

Fig. 5. Examples of explanations provided by CLAT on the FGADDR and private dataset. Each row displays the original fundus image, alongsidethe lesion detection and localization heatmaps produced by CLAT, compared with the corresponding ground-truth annotations. The fundus imagesare annotated with the predicted disease label and its associated confidence score. Below the heatmaps, the types of lesions identified by CLAT andtheir respective confidence levels are listed. Accompanying textual explanations quantify the presence and contributions of specific retinal lesionsto the overall diagnosis, illustrating CLAT’s comprehensive explanatory capabilities.
图5 CLAT在FGADDR数据集与私有数据集上的解释示例 每一行均展示原始眼底图像,以及CLAT生成的病灶检测与定位热力图,并与对应的真值标注进行对比。眼底图像上标注了预测的疾病标签及其相关置信度分数。热力图下方列出了CLAT识别出的病灶类型及其各自的置信度。附带的文本解释量化了特定视网膜病灶的存在情况及其对整体诊断的贡献,体现了CLAT全面的解释能力。

Fig. 6. Examples of diagnostic error of CLAT and the corresponding intervention results on the FGADDR and private dataset. The top two rowsshow the misdiagnosis on the FGADDR dataset, while the bottom row shows the misdiagnosis on the private dataset. The annotations in greenindicate the correct predictions, while the annotations in red indicate the wrong predictions. And the intervened lesion concepts are in orange.
图6 CLAT在FGADDR数据集与私有数据集上的诊断错误示例及相应干预结果 前两行展示FGADDR数据集上的误诊案例,最后一行展示私有数据集上的误诊案例。绿色标注表示正确预测,红色标注表示错误预测,橙色标注为经干预后的病灶概念。
Table
表

Table 1 the distribution of disease and lesion annotations in the four datasets. ex: exudates; he: hemorrhages; ma:microaneurysm; se: soft exudates; rw: retinal whitening; crs: cherry-red spot; scrv: segmental changes of the retinavessel; cws: cotton wool spots; racs: retinal arteriovenous crossing sign
表1 四个数据集的疾病与病灶标注分布情况 注:EX(渗出物)、HE(出血)、MA(微动脉瘤)、SE(软性渗出物)、RW(视网膜变白)、CRS(樱桃红斑)、SCRV(视网膜血管节段性改变)、CWS(棉絮斑)、RACS(视网膜动静脉交叉征)

Table ii comparison of our method against state-of-the-art disease grading models and concept-based models (cbms) on disease grading across the fgaddr, fgadr, ddr-subset and private dataset. the reported results are the mean values of 10-fold cross-validation (unit:%). * indicates the models are initialized with pre-trained weights on large fundus dataset. † indicates the transformer-based model. bold indicates the optimal performance, and underline indicates suboptimal performance. sens: sensitivity. spec: specificity
表2 我们的方法与当前最先进的疾病分级模型及基于概念的模型(CBMS)在FGADDR、FGADR、DDR子集和私有数据集上的疾病分级性能对比 注:报告结果为10折交叉验证的平均值(单位:%)。*表示模型使用在大型眼底数据集上预训练的权重进行初始化;†表示基于Transformer的模型;粗体表示最优性能,下划线表示次优性能;SENS(灵敏度)、SPEC(特异度)。

Table iii comparison of our method against state-of-the-art concept-based models (cbms) on lesion concept prediction across the fgaddr, fgadr, ddr-subset and private dataset. * indicates the models are initialized with pre-trained weights on large fundus dataset. † indicates the transformer-based model. the reported results are the mean values of 10-fold cross-validation (unit:%). bold indicates the optimal performance, and underline indicates suboptimal performance. acc: accuracy. f1: f1-score
表3 我们的方法与当前最先进的基于概念的模型(CBMS)在FGADDR、FGADR、DDR子集及私有数据集上的病灶概念预测性能对比 注:*表示模型使用在大型眼底数据集上预训练的权重进行初始化;†表示基于Transformer的模型;报告结果为10折交叉验证的平均值(单位:%);粗体表示最优性能,下划线表示次优性能;ACC(准确率)、F1(F1分数)。

Table iv evaluation of the effectiveness of each module on the fgaddr dataset. the reported results are the mean values of 10-fold cross-validation (unit:%) sld: supervised lesion discovery, kg: knowledge guided enhancement
表4 FGADDR数据集上各模块有效性评估结果 报告结果为10折交叉验证的平均值(单位:%)。SLD(监督式病灶挖掘)、KG(知识引导增强)

Table v effects of test-time lesion concept intervention on fgaddr dataset with different backbones. (unit:%) sens: sensitivity. spec: specificity
表5 不同骨干网络下,测试时病灶概念干预在FGADDR数据集上的效果 (单位:%)。SENS(灵敏度)、SPEC(特异度)