通过大视觉模型实现的多维方向性增强分割|文献速递-医学影像人工智能进展

Oldlee · 5 天前

Title

题目

Multidimensional Directionality-Enhanced Segmentation via large vision model

通过大视觉模型实现的多维方向性增强分割

01

文献速递介绍

黄斑疾病影响着全球约2亿人口，已成为全球视力障碍的主要病因之一。黄斑是视网膜上光感受器最密集的区域，富含对光线敏感的视锥细胞，它的重要性主要体现在决定着颜色感知和高分辨率视觉。黄斑在视网膜中的这一中心位置与我们的中心视野直接相关，这使得黄斑健康对于整体视觉质量至关重要。黄斑的任何病理改变都可能导致视力下降，尤其是中心视力下降，从而显著降低生活质量。黄斑水肿的特征是黄斑区域视网膜各层下方有液体积聚，这会导致视网膜组织肿胀和厚度增加，对视网膜功能和视觉清晰度产生不利影响（林等人，2012年）。常见病因包括年龄相关性黄斑变性（AMD）（曼朱纳特等人，2011年）、视网膜静脉阻塞（RVO）和糖尿病视网膜病变（DR）（埃斯马埃普尔等人，2011年；西姆等人，2013年；科斯卡斯等人，2010年）。光学相干断层扫描（OCT）通过基于时间差异对各种生物组织进行反射测量，能够生成视网膜的详细分层图像（黄等人，1991年；特里乔纳斯和凯泽，2014年），这对于黄斑的定性和定量分析至关重要（沃尔夫和沃尔夫-施努尔布施，2010年）。临床诊断面临着巨大挑战，因为早期黄斑水肿表现微小且多样，再加上定量分析所需的精确性，仅依靠人眼进行诊断十分困难。在这种背景下，利用深度学习（勒昆等人，2015年）和计算机视觉技术（马等人，2023年）对眼底视网膜结构，包括脉络膜（尼克拉斯和沃尔曼，2010年）和黄斑水肿（特拉诺斯等人，2004年）进行高精度分割和定量评估，可以有效提高诊断的准确性和效率。这些技术在手术应用中不可或缺，能够精确地定位水肿和病变区域，有助于确定治疗位置和规划手术方案。然而，OCT图像中视网膜积液的分割面临着挑战，这是由于视网膜积液病变的高度异质性，以及OCT成像固有的低对比度和噪声问题。由于医学图像的复杂性，需要医学专家进行大量的手动标注，这使得这项任务既耗时又昂贵（何等人，2022年）。因此，传统的监督学习模型常常受到标注数据集有限的限制，这也限制了它们在其他任务中的迁移能力（阿尔朱艾德和安瓦尔，2022年）。最近，基于Transformer架构（瓦斯瓦尼等人，2017年）的大视觉模型取得了进展，在自然图像和遥感图像处理中展现出了显著的泛化能力。这些模型在经过数十亿标注数据集的训练后，表现出了强大的零样本和少样本学习能力，尤其是在一般图像分割方面（基里洛夫等人，2023年）。然而，像分割一切模型（SAM）这样的大语言模型（LLM）以及其他基于Transformer的大视觉模型，尽管在自然图像分割任务中表现出色，但当应用于计算机断层扫描（CT）、磁共振成像（MRI）和光学相干断层扫描（OCT）等专业领域时，却难以识别医学图像中的关键结构和区域。这种不足源于在训练过程中没有接触到足够的放射学或其他医学成像数据，这凸显了医学图像和自然图像在成像原理上的差异（吴等人，2023年）。医学图像是特定物理信号（如X射线、超声波、MRI）在人体内相互作用的可视化表现，与自然场景的成像原理有着根本的不同。这些信号在不同组织类型中的传播和相互作用表现为图像密度、强度或颜色的变化，从而揭示了人体内部结构。因此，直接将这些通用模型应用于医学图像分割可能会导致识别精度大幅下降，无法满足医学诊断的高精度要求。此外，精确识别细微的组织变化是临床诊断的关键。对于OCT、CT和MRI来说，在整体图像中几乎难以察觉的最微小的病变或结构变化，可能预示着早期疾病或病理状态。然而，医学图像中固有的复杂纹理、噪声和低对比度使得从这些图像中提取有效信息极具挑战性。因此，尽管分割一切模型（SAM）在全局信息处理方面很出色，但它缺乏传统卷积神经网络（CNN）的局部感受野和空间归纳偏差，这使得它在局部特征提取方面效果欠佳。分割一切模型（SAM）的掩码解码器采用轻量级设计，通过上采样来恢复捕获的高级语义信息和细节以强调效率，但这种设计未能充分发挥编码器在处理富含细节的复杂场景时的潜力（图1）。这种设计无意中限制了解码器重建小物体或密集排列物体的能力，从而在这种情况下降低了分割精度。为了解决上述挑战，本文提出了一种由大视觉模型（LVM）引导的视网膜分割框架，名为多维方向性增强视网膜分割框架（MD-DERFS），它由多维特征重编码单元（MFU）、跨尺度方向洞察网络（CDIN）和谐波细节分割平衡损失函数（HMSE）组成。这个损失函数被用于灵活地将基于Transformer的大视觉模型的泛化能力迁移到专业领域。为了解决大视觉模型的训练数据集缺乏足够的放射学或其他医学成像数据的问题（程等人，2023年），多维特征重编码单元（MFU）利用基于方向一致性映射的方向先验提取机制和水肿纹理映射单元，来增强模型识别特定类型纹理、形状和病理特征的能力。多维特征重编码单元（MFU）还采用了迭代注意力特征融合（iAFF），在不显著增加网络层数或参数的情况下关注跨尺度特征，以解决大视觉模型的轻量级解码器未能充分利用编码器潜力的问题，从而增强了网络捕捉医学图像中微小目标和细微病变的能力。跨尺度方向洞察网络（CDIN）通过整合形态学潜在特征放大单元（MLFA）和角度各向异性分解模块（AAD），提供了从局部到全局的多层次视角，有效地弥补了基于Transformer的大视觉模型编码器在捕捉局部特征信息方面的固有局限性（基里洛夫等人，2023年）。谐波细节分割平衡损失函数（HMSE）解决了黄斑水肿中类别不平衡的挑战，有效地增强了在数据类别分布不平衡情况下对水肿区域的学习能力。为了测试所提出的方法，我们引入了MacuScan-8K数据集，该数据集包含来自黄斑水肿患者的8000张标注的B扫描谱域OCT图像（Spectralis HRA，德国海德堡工程公司）（见图1）（特拉诺斯等人，2004年）。本文的主要贡献总结如下： 1. 我们提出了多维方向性增强视网膜分割框架（MD-DERFS），通过其多层次视角和方向先验提取机制，有效地增强了编码器捕捉局部细节的能力，从而解决了轻量级大视觉模型解码器带来的局限性。 2. 我们引入了多维特征重编码单元（MFU）网络，根据通道切片将高维特征划分为不同子集，分别进行处理，然后再进行融合。提出了跨尺度方向洞察网络（CDIN），基于像素之间的连通性对高维特征中的内容特征和场景特征进行解耦，从而解决了由于特征维度过多导致的信息冗余和通道干扰问题。 3. 我们提出了谐波细节分割平衡损失函数（HMSE），它通过结合二元交叉熵损失函数（BCE）、骰子损失函数（Dice）和多项式损失函数（Poly）的优点，有效地缓解了类别不平衡问题，并提高了分割精度。这种集成方法提高了在OCT黄斑水肿检测中的鲁棒性和性能。第2章广泛介绍了研究背景和相关工作。第3章详细描述了上述所有方法。第4章给出了实验数据，并进行了全面评估。第5章总结了这项工作的主要结论，并阐明了我们提出的方法所具有的重要临床意义。

Abatract

摘要

Optical Coherence Tomography (OCT) facilitates a comprehensive examination of macular edema andassociated lesions. Manual delineation of retinal fluid is labor-intensive and error-prone, necessitating anautomated diagnostic and therapeutic planning mechanism. Conventional supervised learning models arehindered by dataset limitations, while Transformer-based large vision models exhibit challenges in medicalimage segmentation, particularly in detecting small, subtle lesions in OCT images. This paper introduces theMultidimensional Directionality-Enhanced Retinal Fluid Segmentation framework (MD-DERFS), which reducesthe limitations inherent in conventional supervised models by adapting a transformer-based large visionmodel for macular edema segmentation. The proposed MD-DERFS introduces a Multi-Dimensional FeatureRe-Encoder Unit (MFU) to augment the model’s proficiency in recognizing specific textures and pathologicalfeatures through directional prior extraction and an Edema Texture Mapping Unit (ETMU), a Cross-scaleDirectional Insight Network (CDIN) furnishes a holistic perspective spanning local to global details, mitigatingthe large vision model’s deficiencies in capturing localized feature information. Additionally, the framework isaugmented by a Harmonic Minutiae Segmentation Equilibrium loss (HMSE) that can address the challenges ofdata imbalance and annotation scarcity in macular edema datasets. Empirical validation on the MacuScan-8kdataset shows that MD-DERFS surpasses existing segmentation methodologies, demonstrating its efficacy inadapting large vision models for boundary-sensitive medical imaging tasks.

光学相干断层扫描（OCT）有助于全面检查黄斑水肿及相关病变。手动勾勒视网膜积液既耗费人力又容易出错，因此需要一种自动化的诊断和治疗规划机制。传统的监督学习模型受到数据集的限制，而基于Transformer的大视觉模型在医学图像分割方面存在挑战，尤其是在检测OCT图像中的微小病变时。本文介绍了多维方向性增强视网膜积液分割框架（MD-DERFS），通过将基于Transformer的大视觉模型应用于黄斑水肿分割，减少了传统监督模型固有的局限性。所提出的MD-DERFS引入了多维特征重编码单元（MFU），通过方向性先验提取增强模型识别特定纹理和病理特征的能力，还引入了水肿纹理映射单元（ETMU）。跨尺度方向洞察网络（CDIN）提供了从局部到全局细节的整体视角，弥补了大视觉模型在捕捉局部特征信息方面的不足。此外，该框架还增加了谐波细节分割平衡损失函数（HMSE），能够应对黄斑水肿数据集中数据不平衡和标注稀缺的挑战。在MacuScan-8k数据集上的实证验证表明，MD-DERFS超越了现有的分割方法，证明了它在将大视觉模型应用于对边界敏感的医学成像任务方面的有效性。

Method

方法

This paper introduces MD-DERFS, a fundus OCT lesion segmentation framework that uses the generalization ability of large visionmodels for multi-dimensional directionality enhancement. We haveretained Segment Anything Model’s encoder in MD-DERFS, while focusing on the subsequent multi-dimensional orientation informationextraction and deep decoding of the encoder, as well as the lossfunction. In this section, we first introduce the overall framework ofMD-DERFS, and then introduce the MFU we propose to improve localfeature extraction. CDIN works with MLFA and AAD to capture globalcontext and complex local details to ensure comprehensive feature extraction. In addition, we introduce HMSE that combines the advantagesof BCE, Dice and the Poly, while addressing the severe categoryimbalance in the retinal edema dataset.

本文介绍了MD-DERFS，这是一种眼底光学相干断层扫描（OCT）病变分割框架，它利用大型视觉模型的泛化能力来实现多维度方向性增强。在MD-DERFS中，我们保留了“分割一切模型”（Segment Anything Model）的编码器，同时重点关注编码器后续的多维度方向信息提取和深度解码，以及损失函数。在本节中，我们首先介绍MD-DERFS的整体框架，然后介绍我们提出的用于改进局部特征提取的多特征单元（MFU）。上下文动态交互网络（CDIN）与多尺度局部特征聚合器（MLFA）和自适应注意力蒸馏器（AAD）协同工作，以捕捉全局上下文信息和复杂的局部细节，从而确保全面的特征提取。此外，我们引入了结合了二元交叉熵损失函数（(L{BCE})）、骰子损失函数（(L{Dice})）和多项式损失函数（(L{Poly})）优点的混合均方误差损失函数（(L{HMSE})），同时解决了视网膜水肿数据集中严重的类别不平衡问题。

Conclusion

结论

he precise segmentation of OCT images reveals subtle structuralchanges in the retina, instrumental in early detection of eye diseasessuch as macular degeneration, glaucoma, and diabetic retinopathy.Additionally, it plays a crucial role in surgeries, aiding in the exact localization of edematous and lesioned areas, thereby facilitating precisetreatment targeting and surgical planning. This study introduces threemajor enhancements in the task of retinal edema segmentation fromfundus OCT images.We propose MD-DERFS to exploit further the high-level semanticinformation encoded by the large vision model image encoder, forapplication in specialized medical image segmentation domains. Secondly, the MFU was integrated into MD-DERFS, effectively utilizingimage priors and designing an Edema Texture Mapping Unit to better adapt to the unique morphology of retinal edema, enhancing themodel’s ability to capture information about small, localized lesions.Moreover, the CDIN structure was incorporated into MD-DERFS, improving the extraction of directionally relevant information in imagesand providing a multi-level perspective from local to global. Thisallowed the network to simultaneously capture detailed features oflesions and their overall spatial layout, leading to more accurate andreliable segmentation.Furthermore, we introduce HMSE combining BCE, Dice, and Poly,aiming to enhance the model’s learning of edema image information ondatasets with class imbalance, and optimizing the model’s pixel-levelsegmentation accuracy and overall shape recognition in edematousregions.

MD-DERFS, by integrating Transformer-based large visual models,overcomes the constraints traditional supervised learning methods facein medical image segmentation tasks due to dataset scale and quality.The innovation of this framework lies in its successful adaptation of thepowerful visual comprehension ability of large models to the uniquedomain of medical imaging. Specifically, its enhanced structure significantly improves the model’s ability to capture key details in fundus OCTimages. The results demonstrate that our method not only transcendsthe limitations of data dependency but also significantly enhancesthe recognition accuracy of small, localized lesion areas, providingreliable technical support for the precise diagnosis and treatment ofeye diseases.

光学相干断层扫描（OCT）图像的精确分割能够揭示视网膜中细微的结构变化，这对于早期检测黄斑变性、青光眼和糖尿病性视网膜病变等眼部疾病具有重要作用。此外，它在手术中也起着关键作用，有助于精确确定水肿和病变区域的位置，从而为精准的治疗靶向和手术规划提供便利。本研究在从眼底OCT图像进行视网膜水肿分割的任务中引入了三项重大改进。我们提出了多维方向性增强视网膜分割框架（MD-DERFS），以进一步利用由大视觉模型图像编码器编码的高级语义信息，并将其应用于专业的医学图像分割领域。其次，将多维特征重编码单元（MFU）集成到MD-DERFS中，有效利用图像先验信息，并设计了一个水肿纹理映射单元，以便更好地适应视网膜水肿的独特形态，增强了模型捕捉小范围局部病变信息的能力。此外，将跨尺度方向洞察网络（CDIN）结构纳入MD-DERFS中，改进了图像中与方向相关信息的提取，并提供了从局部到全局的多层次视角。这使得网络能够同时捕捉病变的详细特征及其整体空间布局，从而实现更准确、可靠的分割。此外，我们引入了结合了二元交叉熵损失函数（BCE）、骰子损失函数（Dice）和多项式损失函数（Poly）的混合均方误差损失函数（HMSE），旨在在类别不平衡的数据集上增强模型对水肿图像信息的学习，并优化模型在水肿区域的像素级分割精度和整体形状识别能力。 MD-DERFS通过集成基于Transformer的大视觉模型，克服了传统监督学习方法在医学图像分割任务中因数据集规模和质量所面临的限制。该框架的创新之处在于它成功地将大模型强大的视觉理解能力适配到医学成像这一独特领域。具体而言，其增强的结构显著提高了模型捕捉眼底OCT图像关键细节的能力。结果表明，我们的方法不仅超越了对数据的依赖限制，还显著提高了对小范围局部病变区域的识别精度，为眼部疾病的精确诊断和治疗提供了可靠的技术支持。

Results

结果

4.1. Dataset

A dataset named Macular Edema Enhanced Retinal OCT DatasetMacuScan-8k, comprises 8000 annotated B-scan SD-OCT images (Spectralis HRA, Heidelberg Engineering, Germany), obtained from patientsdiagnosed with macular edema at the Zhejiang Provincial People’s Hospital over a five-year period from May 1, 2016, to December 31, 2021.The significant volume and superior annotation quality of MacuScan-8k mark a substantial enhancement over existing publicly availabledatasets in terms of data quantity and collection efforts.The data encompasses retinal OCT scans of 119 patients diagnosedwith macular hole, totaling 126 sequences with each sequence containing 17 to 115 slices. OCT volume scans were centered around themacula, covering an area of 6.0 × 4.5 millimeters (20◦ × 15◦ ), witha resolution of 496 × 512 pixels. The average axial, transverse, andazimuthal pixel spacing were 3.87 μm, 11.50 μm, and 120.96 μm, respectively. All scans originated from the same equipment, and any datawith severe artifacts or significantly reduced signal strength impedingthe recognition of retinal interfaces were excluded.The labeling phase, conducted from June to December 2021, involved five experienced radiologists who manually annotated the retinal, macular edema, and macular hole in each B-scan of the OCTvolumes using segmentation editor software. Following the initial annotation, two senior retinal experts reviewed the results. This reviewprocess included multiple rounds of feedback and revision to ensurethe accuracy of the annotations. In the diagnosis of fundus diseases, IRFrefers to the accumulation of fluid in the retinal layer, which is usuallyassociated with retinopathy such as macular edema. SRF indicates thepresence of fluid under the retinal layer, which is commonly observedin conditions such as choroidal neovascularization and can lead tovisual impairment. The annotation focused on classifying IRF and SRFas one category to enhance segmentation precision in the network(Fig. 9).

4.1. 数据集一个名为“黄斑水肿增强型视网膜OCT数据集（Macular Edema Enhanced Retinal OCT Dataset，简称MacuScan-8k）”，包含8000张经过标注的B扫描谱域光学相干断层扫描（SD-OCT）图像（德国海德堡工程公司的Spectralis HRA设备采集），这些图像是在2016年5月1日至2021年12月31日这五年期间，从浙江省人民医院确诊为黄斑水肿的患者处获取的。 MacuScan-8k数据集的大容量和高质量标注，在数据量和收集工作方面，相较于现有的公开可用数据集有了显著提升。该数据涵盖了119名被诊断患有黄斑裂孔患者的视网膜OCT扫描图像，共计126个序列，每个序列包含17到115个切片。OCT容积扫描以黄斑为中心，覆盖面积为6.0×4.5毫米（20°×15°），分辨率为496×512像素。平均轴向、横向和方位像素间距分别为3.87微米、11.50微米和120.96微米。所有扫描均来自同一设备，任何带有严重伪影或信号强度显著降低、妨碍视网膜界面识别的数据均被排除在外。标注阶段在2021年6月至12月期间进行，由五名经验丰富的放射科医生使用分割编辑软件，对OCT容积的每个B扫描图像中的视网膜、黄斑水肿和黄斑裂孔进行手动标注。在初始标注之后，两名资深视网膜专家对标注结果进行了审核。这个审核过程包括多轮反馈和修订，以确保标注的准确性。在眼底疾病的诊断中，视网膜内积液（IRF）是指视网膜层内的液体积聚，通常与黄斑水肿等视网膜病变相关。视网膜下积液（SRF）表示视网膜层下存在液体，常见于脉络膜新生血管等病症中，并且可能导致视力障碍。标注工作重点将视网膜内积液（IRF）和视网膜下积液（SRF）归为一类，以提高网络中的分割精度（图9）。

Figure

图

Fig. 1. From left to right: original OCT images, SAM segmentation outputs, Fine-tunedSAM segmentation outputs on MacuScan-8K, and MD-DERFS segmentation outputs posttraining on MacuScan-8K. The segmentation of the lesion area in the red frame requiresmore fine-grained local feature knowledge. As shown in the figure, the MD-DERFSsignificantly improves the OCT edema segmentation

图1：从左至右依次为：原始光学相干断层扫描（OCT）图像、分割一切模型（SAM）的分割输出结果、在MacuScan-8K数据集上微调后的分割一切模型（SAM）的分割输出结果，以及在MacuScan-8K数据集上训练后的多维方向性增强视网膜分割框架（MD-DERFS）的分割输出结果。红色框内病变区域的分割需要更精细的局部特征知识。如图所示，多维方向性增强视网膜分割框架（MD-DERFS）显著提升了对OCT图像中水肿区域的分割效果。

Fig. 2. The overall framework diagram of MD-DERFS, the fundus OCT image will first be mapped to the feature space by a pre-trained SAM encoder, resulting in five shapeimage embeddings (𝐸1 -𝐸5 ) with size 255 × 64 × 64. 𝐸1 is input into the MFU, where 𝐸1 is sliced along channels to fully exploit the prior knowledge of the OCT image, solvingthe problem that the large SAM model pre-trained lacks specific medical knowledge; 𝐸2 -𝐸5 are input into the CDIN, solving the problem that the visual large model based onTransformer structure is insufficient in extracting local fine features of the image, and the fine features obtained from the two modules are fused by iAFF for attention featurefusion to obtain the segmentation result

图2：多维方向性增强视网膜分割框架（MD-DERFS）的整体框架图。眼底光学相干断层扫描（OCT）图像首先会由预训练的分割一切模型（SAM）编码器映射到特征空间，从而得到五个尺寸为255×64×64的形状图像嵌入（E1 - E5 ）。E1 被输入到多维特征重编码单元（MFU）中，在该单元中，E1 会沿着通道进行切片，以充分利用OCT图像的先验知识，解决了预训练的大型分割一切模型（SAM）缺乏特定医学知识的问题；E2 - E5 被输入到跨尺度方向洞察网络（CDIN）中，解决了基于Transformer结构的视觉大模型在提取图像局部精细特征方面的不足问题。从这两个模块获得的精细特征通过迭代注意力特征融合（iAFF）进行融合，以实现注意力特征融合，进而得到分割结果。

Fig. 3. Diagram structure of the MFU. The MFU employs feature slicing to segmentthe input into distinct groups. Each group is processed by the Edema Texture MappingUnit to extract directional prior features. Subsequent fusion via the iAFF optimizessegmentation accuracy while maintaining the framework’s complexity at a manageablelevel.

图3：多维特征重编码单元（MFU）的结构示意图。MFU采用特征切片的方法，将输入分割成不同的组。每个组都由水肿纹理映射单元进行处理，以提取方向性先验特征。随后，通过迭代注意力特征融合（iAFF）进行融合，在将框架的复杂度维持在可控制水平的同时，优化了分割精度。

Fig. 4. The Edema Texture Mapping Unit mentioned in the overview, where LayerNormalization means Layer Normalization, the MFU’s sliced features are fed into theEdema Texture Mapping Unit to achieve a comprehensive multi-angle analysis of theimage in terms of spatial and textural nuances.

图4：整体介绍中提及的水肿纹理映射单元。其中，“LayerNormalization”表示层归一化。多维特征重编码单元（MFU）经过切片的特征被输入到水肿纹理映射单元中，以便从空间和纹理的细微差别方面，对图像进行全面的多角度分析。

Fig. 5. The network structure diagram of CDIN, where image embeddings 𝐸i obtainedfrom the SAM encoder. The module features multiple stages of deep feature extractionusing AAD and MLFA modules.

图5：跨尺度方向洞察网络（CDIN）的网络结构示意图。图中，Eᵢ为从分割一切模型（SAM）编码器得到的图像嵌入。该模块的特点是使用角度各向异性分解模块（AAD）和形态学潜在特征放大单元（MLFA）进行多阶段的深度特征提取。

Fig. 6. The structure diagram of MLFA. The module consists of two stages, each incorporating a dilated convolution layer followed by a Batch Normalization and a ReLU activationfunction. Between the two stages, an upsampling layer is employed to increase the spatial resolution of the feature maps. This module aims to capture multi-scale features effectivelyby leveraging dilated convolutions and subsequent upsampling.

图6：形态学潜在特征放大单元（MLFA）的结构示意图。该单元由两个阶段组成，每个阶段都包含一个空洞卷积层，其后依次是批量归一化层和ReLU激活函数。在两个阶段之间，使用了一个上采样层来提高特征图的空间分辨率。该单元旨在通过利用空洞卷积和后续的上采样操作，有效地捕获多尺度特征。

Fig. 7. The structure diagram of AAD. The module consists of two main components:the Scene Encoder and the Content Encoder. The Scene Encoder primarily includestwo dilated convolutions and an SPP module. The Content Encoder comprises dilatedconvolutions, Batch Normalization, and ReLU activation functions. The outputs fromthese encoders are combined and processed through a sigmoid function to formrelational features 𝐸𝑠𝑐 , which are then integrated to produce the final 𝐸𝐴𝐴?

图7：角度各向异性分解模块（AAD）的结构示意图。该模块主要由两个部分组成：场景编码器（Scene Encoder）和内容编码器（Content Encoder）。场景编码器主要包含两个空洞卷积层和一个空间金字塔池化（SPP）模块。内容编码器由空洞卷积层、批量归一化层以及ReLU激活函数构成。这些编码器的输出会被合并，并通过一个 sigmoid 函数进行处理，以形成关系特征𝐸𝑠𝑐 ，然后将这些特征进行整合，从而生成最终的𝐸𝐴𝐴? 。

Fig. 8. Depth space feature visualization. After being encoded by SAM’s encoder, theimage embedding contains a lot of noise information. By separating the content featuresand scene features in the depth space using CDIN, further depth feature extraction isperformed using AAD, resulting in content features that include clear boundaries, textures, etc., and scene features that include edema volume and background information,which are very helpful for edema segmentation.

图8：深度空间特征可视化。图像嵌入经过分割一切模型（SAM）的编码器编码后，包含了大量的噪声信息。通过使用跨尺度方向洞察网络（CDIN）在深度空间中分离内容特征和场景特征，并利用角度各向异性分解模块（AAD）进一步进行深度特征提取，得到了包含清晰边界、纹理等的内容特征，以及包含水肿体积和背景信息的场景特征，这对水肿分割非常有帮助。

Fig. 9. The figure shows part of the dataset image, where the region selected in the blue box is IRF and the region selected in the red box is SRF. The labels F1 through F6 areused for reference and identification of each frame.

图9：该图展示了部分数据集图像，其中蓝色框选区域为视网膜下积液（IRF），红色框选区域为视网膜内积液（SRF）。标签F1至F6用于参考和识别每一帧图像。

Fig. 10. Comparison of segmentation effects. The original OCT image is shown in thefigure, the manually annotated GroundTruth is represented by a red mask, and theresult of automatic segmentation using the model is represented by a blue mask, asshown in the figure is the MD-DERFS segmentation result, the resulting mask of twodifferent colors is merged into one image, and the overlapping part is represented bywhite.

图10：分割效果对比图。图中展示了原始的光学相干断层扫描（OCT）图像，人工标注的真实值（Ground Truth）以红色掩码表示，使用模型进行自动分割的结果以蓝色掩码表示。如图所示为多维方向性增强视网膜分割框架（MD-DERFS）的分割结果，将两种不同颜色的掩码合并到一张图像中，重叠部分以白色表示。

Fig. 11. Box plots comparing six indicators with the baseline: Recall, Mcc, Dice, IoU, Kappa, and G-mean. On the 𝑥-axis, the model is labeled as: (a) U-Net; (b) AttUnet; ©SegNet; (d) FCN-8s; (e) Enet; (f) FC-DenseNet; (g) PSPNet; (h) GCN; (i) BiSeNetL; (j) DeepLabV3+; (k) PSANet; (l) R2UNet; (m) CENet; (n) CGNet; (o) LEDNet; (p) UNet 3+; (q)TransUNet; ® MD-DERFS

图11：与基线相比的六个指标（召回率、马修斯相关系数（Mcc）、骰子系数（Dice）、交并比（IoU）、卡帕系数（Kappa）和几何平均（G-mean））的箱线图。在x轴上，各模型标注如下：(a) U-Net；(b) AttUnet；© SegNet；(d) FCN-8s；(e) Enet；(f) FC-DenseNet；(g) PSPNet；(h) GCN；(i) BiSeNetL；(j) DeepLabV3+；(k) PSANet；(l) R2UNet；(m) CENet；(n) CGNet；(o) LEDNet；(p) UNet 3+；(q) TransUNet；® 多维方向性增强视网膜分割框架（MD-DERFS）

Fig. 12. Visual comparison of the evaluation results of MD-DERFS and 17 other segmentation methods on the MacuScan-8k dataset. We selected 6 representative images fordisplay. In these images, GroundTruth is shown in red, the mask of the model segmentation is shown in blue, and the coincident parts are white to indicate that the segmentationis correct. In instances F1 and F5, the edema regions are indistinguishably segmented from the adjacent retinal tissue, highlighting MD-DERFS’s exceptional inferential strength forprecise segmentation. Additionally, MD-DERFS consistently achieves accurate segmentation of the edema areas and their peripheries in instances F2, F3, F4, and F6.

图12：多维方向性增强视网膜分割框架（MD-DERFS）与其他17种分割方法在MacuScan-8k数据集上的评估结果的可视化比较。我们选取了6张具有代表性的图像进行展示。在这些图像中，真实标注（GroundTruth）以红色显示，模型分割的掩码以蓝色显示，重合部分为白色，表明显分割是正确的。在实例F1和F5中，水肿区域与相邻的视网膜组织之间的分割界限清晰，这突显了MD-DERFS在精确分割方面卓越的推断能力。此外，在实例F2、F3、F4和F6中，MD-DERFS始终能够对水肿区域及其周边进行准确的分割。

Fig. 13. Box plots showing six indicators from the ablation study on Edema TextureMapping Units. On the 𝑥-axis, the model is labeled as: (a) MFU 2-Layer; (b) MFU3-Layer; © MFU 4-Layer; (d) MFU 5-Layer; (e) MFU 6-Layer; (f) MFU 7-Layer.

图13：展示关于水肿纹理映射单元的消融研究中六个指标的箱线图。在x轴上，模型标注如下：(a) 具有2层的多维特征重编码单元（MFU 2-Layer）；(b) 具有3层的多维特征重编码单元（MFU 3-Layer）；© 具有4层的多维特征重编码单元（MFU 4-Layer）；(d) 具有5层的多维特征重编码单元（MFU 5-Layer）；(e) 具有6层的多维特征重编码单元（MFU 6-Layer）；(f) 具有7层的多维特征重编码单元（MFU 7-Layer）。

Fig. 14. Box plots showing six indicators from the ablation study on the key frameworkmodules. On the 𝑥-axis, the model is labeled as: (a) SAM; (b) SAM+CDIN+iAFF; ©SAM+CDIN+MFU-6L; (d) SAM+CDIN+MFU-4L; (e) CDIN+MFU; (f) SAM+CDIN+MFU-4L+iAFF; (g) SAM+ CDIN+MFU-6L+iAFF.

图14：展示关于关键框架模块的消融研究中六个指标的箱线图。在x轴上，模型标注如下：(a) 分割一切模型（SAM）；(b) 分割一切模型+跨尺度方向洞察网络+迭代注意力特征融合（SAM+CDIN+iAFF）；© 分割一切模型+跨尺度方向洞察网络+具有6层的多维特征重编码单元（SAM+CDIN+MFU-6L）；(d) 分割一切模型+跨尺度方向洞察网络+具有4层的多维特征重编码单元（SAM+CDIN+MFU-4L）；(e) 跨尺度方向洞察网络+多维特征重编码单元（CDIN+MFU）；(f) 分割一切模型+跨尺度方向洞察网络+具有4层的多维特征重编码单元+迭代注意力特征融合（SAM+CDIN+MFU-4L+iAFF）；(g) 分割一切模型+跨尺度方向洞察网络+具有6层的多维特征重编码单元+迭代注意力特征融合（SAM+ CDIN+MFU-6L+iAFF）。

Fig. 15. Box plots showing six indicators from the ablation study on loss. On the𝑥-axis, the model is labeled as: (a) 𝛾𝛽-Focus; (b) 𝛾𝛼-Focus; © 𝛼𝛽-Focus; (d) 𝛽𝛼-Focus; (e)𝛽𝛾-Focus; (f) Uniform; (g) 𝛼-Focus.

图15：展示关于损失函数的消融研究中六个指标的箱线图。在x轴上，模型标注如下：(a) 𝛾𝛽-聚焦损失函数（𝛾𝛽-Focus）；(b) 𝛾𝛼-聚焦损失函数（𝛾𝛼-Focus）；© 𝛼𝛽-聚焦损失函数（𝛼𝛽-Focus）；(d) 𝛽𝛼-聚焦损失函数（𝛽𝛼-Focus）；(e) 𝛽𝛾-聚焦损失函数（𝛽𝛾-Focus）；(f) 均匀损失函数（Uniform）；(g) 𝛼-聚焦损失函数（𝛼-Focus）。

Fig. 16. Comparison of the structure of the ablation experiment on the model structure of MDDERFS. A-E represent MD-DERFS, SAM+CDIN+iAFF, SAM+CDIN+MFU-6L,SAM+CDIN+MFU-4L, CDIN+MFU, SAM+CDIN+MFU-4L+iAFF, SAM+CDIN+MFU-6L+iAFF, red indecate GroundTurth, blue denote the segmentation result of each model, andthe overlapping is represented by white.

图16：针对多维方向性增强视网膜分割框架（MD-DERFS）模型结构的消融实验结构对比图。A至E分别代表MD-DERFS、分割一切模型（SAM）+跨尺度方向洞察网络（CDIN）+迭代注意力特征融合（iAFF）、分割一切模型（SAM）+跨尺度方向洞察网络（CDIN）+具有6层的多维特征重编码单元（MFU-6L）、分割一切模型（SAM）+跨尺度方向洞察网络（CDIN）+具有4层的多维特征重编码单元（MFU-4L）、跨尺度方向洞察网络（CDIN）+多维特征重编码单元（MFU）、分割一切模型（SAM）+跨尺度方向洞察网络（CDIN）+具有4层的多维特征重编码单元（MFU-4L）+迭代注意力特征融合（iAFF）、分割一切模型（SAM）+跨尺度方向洞察网络（CDIN）+具有6层的多维特征重编码单元（MFU-6L）+迭代注意力特征融合（iAFF）。红色表示真实标注（GroundTruth），蓝色表示各模型的分割结果，重叠部分用白色表示。

Table

表

Table 1Hyperparameter configurations utilized throughout the experimental protocol

表1：在整个实验方案中所使用的超参数配置

Table 2Experimental results of the proposed method and 17 previous segmentation methods on the MacuScan-8k dataset. The best value of the experimental results is highlighted in red,and the second best value is highlighted in blue.

表2：所提出的方法与先前17种分割方法在MacuScan-8k数据集上的实验结果。实验结果中的最优值以红色突出显示，次优值以蓝色突出显示。

Table 3The table shows the results of the ablation experiments that were gradually added to our innovative network on the basis of large vision model. The red data is the best resultof the corresponding index, whereas the blue data is the secend

表3：该表展示了在大视觉模型的基础上逐步添加到我们创新网络中的消融实验结果。红色数据是相应指标的最佳结果，而蓝色数据是次佳结果。

Table 4The table aims to explore the impact of various hyperparameter configurations on our loss function on model performance.Red data in the table denotes the optimal results for the respective metrics, whereas blue signifies the second-best outcomes.

表4：该表旨在探究各种超参数配置对我们的损失函数以及模型性能的影响。表中的红色数据表示各指标的最优结果，而蓝色数据则代表次优结果。

Table 5The main segmentation metrics of the model under five different numbers of Edema Texture Mapping Unitlayers, namely 2-Layer, 3-Layer, 4-Layer, 5-Layer, 6-Layer, 7-Layer, are presented. The data in red is thebest result of the corresponding index, and the blue data is the second best result.

表5：该表呈现了模型在五种不同数量的水肿纹理映射单元层（即2层、3层、4层、5层、6层、7层）下的主要分割指标。表中红色数据是相应指标的最佳结果，蓝色数据则是次佳结果。