Title
题目
Real-time placental vessel segmentation in fetoscopic laser surgery for Twin-to-Twin Transfusion Syndrome
实时胎盘血管分割在胎儿镜激光手术中的应用——针对双胎输血综合征的研究
01
文献速递介绍
双胎输血综合征(TTTS)是一种虽不常见但严重的并发症,影响约10%至15%的单绒毛膜双胎妊娠(Lewi等,2008)。如果不及时治疗,因双胞胎之间血流失衡可能导致严重的并发症,甚至导致两名胎儿的死亡(Haverkamp等,2001)。这种情况是由于胎盘中存在异常的血管连接——动静脉瘘所引起的,这些连接将两名胎儿的血液循环连在一起。在正常的单绒毛膜双胎妊娠中,通常不存在这些瘘,但在TTTS中几乎总是存在。病理性血流失衡导致一个胎儿(“受者”)接受过多的血液,而另一个胎儿(“供者”)则接受的血液不足(Umur等,2002)。
Aastract
摘要
Twin-to-Twin Transfusion Syndrome (TTTS) is a rare condition that affects about 15% of monochorionicpregnancies, in which identical twins share a single placenta. Fetoscopic laser photocoagulation (FLP) is thestandard treatment for TTTS, which significantly improves the survival of fetuses. The aim of FLP is to identifyabnormal connections between blood vessels and to laser ablate them in order to equalize blood supply to bothfetuses. However, performing fetoscopic surgery is challenging due to limited visibility, a narrow field of view,and significant variability among patients and domains. In order to enhance the visualization of placentalvessels during surgery, we propose TTTSNet, a network architecture designed for real-time and accurateplacental vessel segmentation. Our network architecture incorporates a novel channel attention module andmulti-scale feature fusion module to precisely segment tiny placental vessels. To address the challenges posedby FLP-specific fiberscope and amniotic sac-based artifacts, we employed novel data augmentation techniques.These techniques simulate various artifacts, including laser pointer, amniotic sac particles, and structural andoptical fiber artifacts. By incorporating these simulated artifacts during training, our network architecturedemonstrated robust generalizability. We trained TTTSNet on a publicly available dataset of 2060 video framesfrom 18 independent fetoscopic procedures and evaluated it on a multi-center external dataset of 24 in-vivoprocedures with a total of 2348 video frames. Our method achieved significant performance improvementscompared to state-of-the-art methods, with a mean Intersection over Union of 78.26% for all placental vesselsand 73.35% for a subset of tiny placental vessels. Moreover, our method achieved 172 and 152 frames persecond on an A100 GPU, and Clara AGX, respectively. This potentially opens the door to real-time applicationduring surgical procedures.
双胎输血综合征(TTTS)是一种罕见的疾病,影响大约15%的单绒毛膜妊娠病例,其中同卵双胞胎共享一个胎盘。胎儿镜激光光凝术(FLP)是TTTS的标准治疗方法,显著提高了胎儿的存活率。FLP的目标是识别异常的血管连接并通过激光消融这些连接,以平衡两个胎儿的血液供应。然而,执行胎儿镜手术具有挑战性,因为可视性有限、视野狭窄,并且患者之间和操作领域的差异性很大。为了在手术过程中增强胎盘血管的可视化,我们提出了TTTSNet,一种专为实时、精确胎盘血管分割设计的网络架构。我们的网络架构结合了一个新的通道注意力模块和多尺度特征融合模块,能够精确分割细小的胎盘血管。为了应对FLP特有的光纤镜和羊膜囊导致的伪影挑战,我们采用了新颖的数据增强技术,这些技术模拟了各种伪影,包括激光指示器、羊膜囊颗粒以及结构和光纤伪影。通过在训练过程中引入这些模拟的伪影,我们的网络架构展示了强大的泛化能力。我们在一个公开的包含18次独立胎儿镜手术的2060帧视频数据集上训练了TTTSNet,并在一个多中心外部数据集上对24次体内手术共2348帧视频进行了评估。与最新的先进方法相比,我们的方法在所有胎盘血管的平均交并比(IoU)达到78.26%,在细小胎盘血管子集上的交并比为73.35%。此外,我们的方法在A100 GPU和Clara AGX上分别达到了每秒172帧和152帧的处理速度,这有可能为手术过程中的实时应用开辟了新的可能性。
Method
方法
This section presents our network architecture, TTTSNet, for realtime placental vessel segmentation for TTTS surgery. Our approach includes an asymmetric encoder–decoder neural network, feature fusionmodule, and channel-attention mechanism. Additionally, we introducenovel data augmentation approaches to increase the robustness andgeneralizability of the trained model against artifacts.
本节介绍了我们用于TTTS手术实时胎盘血管分割的网络架构TTTSNet。我们的方法包括非对称编码器-解码器神经网络、特征融合模块和通道注意力机制。此外,我们还引入了新颖的数据增强方法,以提高训练模型对伪影的鲁棒性和泛化能力。
Conclusion
结论
We have proposed a network architecture for real-time placental vessel segmentation in videos obtained during FLP for TTTS. Toimprove performance, we have developed custom network and dataaugmentations specifically tailored for this task. Our experiments ona large and diverse test set have shown that TTTSNet is not onlyaccurate in terms of segmentation metric but also robust in terms ofgeneralizability to datasets from different institutions. Furthermore, ourmethod demonstrates superior performance compared to current stateof-the-art methods. In the future, the use of TTTSNet may aid surgeonsduring real-time fetoscopic fetal surgery to accurately identify criticalstructures and ultimately improve outcomes of TTTS treatments.
我们提出了一种网络架构,用于在TTTS胎儿镜激光光凝术(FLP)过程中获取的视频中进行实时胎盘血管分割。为了提高性能,我们开发了专门针对这一任务的自定义网络和数据增强方法。我们在一个大型且多样化的测试集上进行的实验表明,TTTSNet不仅在分割指标上表现出色,而且在不同机构的数据集上具有良好的泛化能力。此外,我们的方法相较于现有的先进方法表现出优越的性能。未来,TTTSNet的使用可能会帮助外科医生在实时胎儿镜手术中准确识别关键结构,从而最终改善TTTS治疗的效果。
Results
结果
This section presents the results of two ablation studies conductedto demonstrate the impact of each key component in the proposedTTTSNet, along with the custom data augmentations, the results of placental vessel segmentation, and comparison with FetReg2021 challenge solutions.
本节展示了两项消融研究的结果,以证明在所提出的TTTSNet中每个关键组件的影响,以及自定义数据增强方法的效果。同时还展示了胎盘血管分割的结果,并与FetReg2021挑战赛的解决方案进行了比较。
Figure
图
Fig. 1. An overview of FLP for TTTS. Twin fetuses, each within their own amnioticsac, are shown. The monochorionic twin pregnancy is characterized by a single sharedplacenta, typically with vascular connections that allow an exchange of blood betweentwins. A fetoscope is used to inspect the placental vessels and find pathologicalconnections which cause an imbalance in blood exchange. When such connections areidentified, they are coagulated using laser light. An ultrasound probe is typically usedto guide the insertion of the fetoscope.
图 1. TTTS的胎儿镜激光光凝术(FLP)概述。图中显示了每个位于自己羊膜囊内的双胞胎胎儿。单绒毛膜双胎妊娠的特征是胎儿共享一个胎盘,通常伴有血管连接,使得双胞胎之间可以进行血液交换。使用胎儿镜检查胎盘血管,寻找导致血液交换失衡的病理性连接。当发现这些异常连接时,使用激光光凝进行凝固处理。通常会使用超声探头引导胎儿镜的插入位置。
Fig. 2. An overview of the TTTSNet network architecture for real-time placental vessel segmentation during FLP for TTTS. The TTTSNet is designed as an asymmetric encoder–decoder neural network, taking a three-channel RGB input image and producing binary segmentation maps as output. In the encoder part, TTTSNet consists of the Initial Block,Residual Feature Fusion Module (RFFM) blocks, and Split-Extract-Merge Bottleneck (SEM-B) blocks, including a channel-attention mechanism called Max Pooled Channel-AttentionMechanism (MEDCAM). The encoder part allows the extraction of contextual features with low computational complexity, allowing efficient and fast processing with a few modelparameters. In RFFM modules, a ∙ preceded with the dashed line denotes residual connections, which aid the model in learning complex features without increasing the numberof model parameters In the decoder part, we use the lightweight Multi-scale Attention Decoder (MAD). The MAD, with its multi-scale attention mechanism, allows the decoder toeffectively recover spatial feature representation by using a minimal number of parameters
图 2. TTTSNet网络架构概述,用于TTTS胎儿镜激光光凝术(FLP)过程中实时胎盘血管分割。TTTSNet被设计为非对称编码器-解码器神经网络,输入为三通道RGB图像,输出为二值分割图。在编码器部分,TTTSNet由初始块、残差特征融合模块(RFFM)块和拆分-提取-合并瓶颈(SEM-B)块组成,包括一个称为最大池化通道注意力机制(MEDCAM)的通道注意力机制。编码器部分能够以低计算复杂度提取上下文特征,从而实现高效、快速的处理,并使用少量的模型参数。在RFFM模块中,前面带有虚线的符号“∙”表示残差连接,这有助于模型在不增加模型参数数量的情况下学习复杂特征。在解码器部分,我们使用轻量级的多尺度注意力解码器(MAD)。MAD通过其多尺度注意力机制,能够以最少的参数有效恢复空间特征表示。
Fig. 3. Proposed RFFM preserves the identity function to aid the model in learningcomplex features without increasing the number of model parameters. In RFFM-A, weconcatenate an identity path of input block features and process with convolution 1 × 1concatenated features of the raw image and Initial Block. In RFFM-B, processed inputfeature maps are concatenated to the output of the SEM-Bs, down-sampled raw image,MEDCAM’s output, and residual connection of input feature maps. The SEM-B Block 𝑁bounded in the dashed box comprises (𝛼+ 1) SEM-Bs, where 𝑁 corresponds to block1 or 2.
图 3. 提出的RFFM(残差特征融合模块)保留了恒等函数,帮助模型在不增加模型参数数量的情况下学习复杂特征。在RFFM-A中,我们将输入块特征的恒等路径与初始块的原始图像1×1卷积的连接特征进行拼接处理。在RFFM-B中,处理过的输入特征图与SEM-B块的输出、下采样的原始图像、MEDCAM的输出以及输入特征图的残差连接进行拼接。虚线框中的SEM-B块𝑁由(𝛼+1)个SEM-B块组成,其中𝑁对应于第1或第2块。
Fig. 4. Architecture of the SEM-B structure. The SEM-B starts with a 3 × 3 convolution(Conv) that is applied to extract feature maps and reduce the number of input channelsby half. The output of this convolution is then split into two branches, consisting of adepth-wise convolution (DConv) and a depth-wise dilated convolution (DConv(d)). Tofuse multi-scale feature maps, a 3 × 3 convolution is utilized. Batch Normalization (BN)and PReLU activation are applied after every convolutional operation. The module’soutput concatenates the last convolutional layer’s output and the identity of the inputfeature map. 𝑁, and ⊕ denote the number of feature channels and concatenation,respectively
图 4. SEM-B结构的架构。SEM-B首先通过3×3卷积(Conv)提取特征图,并将输入通道数量减半。卷积输出随后分为两个分支,一个包含深度卷积(DConv),另一个包含深度扩张卷积(DConv(d))。为了融合多尺度特征图,使用了一个3×3卷积。在每次卷积操作后,都会应用批量归一化(BN)和PReLU激活。该模块的输出将最后一层卷积层的输出与输入特征图的恒等路径进行拼接。𝑁和⊕分别表示特征通道的数量和拼接操作。
Fig. 5. The proposed MEDCAM module architecture. We utilize Adaptive Max Pooling(AMP) globally and on partitioned feature maps to leverage multi-scale features whilepreserving vessel details. The weighted sum is applied to the partition-pooled featurevector through learned depth-wise convolutional filters, focusing more on specificspatial partitions among channels. Squeeze Excitation (SE) Block allows for dynamicchannel-wise feature re-calibration, resulting in the meaningful channel attention vectorfinally being applied to input features. The MEDCAM utilizes a channel attentionmechanism to focus on specific feature channels and capture important informationabout tiny placenta vessels. ℎ, 𝑤, 𝐶, ⊗ denote feature map height, and width, numberof channels, and multiplication, respectively.
图 5. 提出的MEDCAM模块架构。我们通过自适应最大池化(AMP)全局和分区特征图,利用多尺度特征,同时保留血管细节。通过学习的深度卷积滤波器,将加权和应用于分区池化特征向量,从而在通道间的特定空间分区上聚焦更多。压缩激励(SE)模块允许进行动态通道级特征重新校准,从而生成有意义的通道注意力向量,最终应用于输入特征。MEDCAM利用通道注意力机制,专注于特定的特征通道,捕捉关于微小胎盘血管的重要信息。ℎ、𝑤、𝐶、⊗分别表示特征图的高度、宽度、通道数以及乘法操作。
Fig. 6. A summary of the custom data augmentations is presented. Four examples ofndifferent data augmentations are shown in each row, including laser pointer, amnioticsac particles, structural defects, and optical fiber artifacts. The images depict the inputimage on the left, the real artifact in the middle, and the artificial artifact on the right.
图 6. 展示了自定义数据增强方法的概述。每行显示了四种不同数据增强的示例,包括激光指示器、羊膜囊颗粒、结构缺陷和光纤伪影。这些图像从左到右分别展示了输入图像、真实伪影和人工伪影。
Fig. 7. Examples of corrected annotations: The input (left), original annotation (middle), and corrected annotation (right). Dotted circles emphasize inaccurate annotations,and arrows pinpoint labeling inconsistencies such as annotations beyond the field ofview or not adhering to the edge. The first row illustrates an annotation that failedto fill in the gaps. In the second row, inaccurately delineated edges of the placentalvessel are emphasized. The third row demonstrates the discontinuous annotation ofvessels resulting from amniotic sac particle artifacts. The fourth row shows omitted bigplacental vessels. Lastly, the final row exhibits omitted small placental vessels.
图 7. 修正后的标注示例:左侧为输入图像,中间为原始标注,右侧为修正后的标注。虚线圈强调了不准确的标注,箭头指出了标注中的不一致之处,例如超出视野的标注或未准确遵循边缘的标注。第一行展示了未填补间隙的标注错误。第二行强调了胎盘血管边缘标注不准确的情况。第三行展示了由于羊膜囊颗粒伪影导致的血管标注不连续问题。第四行显示了遗漏的大型胎盘血管。最后一行展示了遗漏的小型胎盘血管的情况。
Fig. 8. Representative video frames from the training set from Center A and Center B. Each row illustrates five consecutive data samples extracted from a single video. In total 90 video frames from 18 independent in-vivo TTTS procedures are presented
图 8. 来自中心A和中心B训练集的代表性视频帧。每行展示了从单个视频中提取的五个连续数据样本。总共展示了来自18个独立体内TTTS手术的90个视频帧。
Fig. 9. Representative video frames from the test set from four centers – Center C, through Center F. Each row illustrates five consecutive data samples extracted from a single video. In total 120 video frames from 24 independent in-vivo TTTS procedures are presented.
图 9. 来自四个中心(中心C至中心F)测试集的代表性视频帧。每行展示了从单个视频中提取的五个连续数据样本。总共展示了来自24个独立体内TTTS手术的120个视频帧。
Fig. 10. A qualitative comparison of the impact of the MEDCAM module on thesegmentation of placental vessels. Each row shows an example from the test set. Inputimage, ground truth, MEDCAM, and without attention module are presented from leftto right, respectively.
图 10. MEDCAM模块对胎盘血管分割效果的定性比较。每行展示了测试集中的一个示例。从左到右分别为输入图像、真实值、使用MEDCAM模块的分割结果以及未使用注意力模块的分割结果。
Fig. 11. Examples of segmentation results obtained on the test set by our proposed TTTSNet model, compared with several state-of-the-art methods and two TTTSNet-basedconfigurations. Ground truth is abbreviated as GT. TTTSNet★ denotes TTTSNet trained on original pixel-wise annotations provided by the FetReg2021 challenge. The images arearranged in order of the best overall score, with the best results on the left. Each row corresponds to a different video, and each column shows the input image, ground truth,results of TTTSNet, and results of other state-of-the-art methods.
图 11. 展示了在测试集中由我们提出的TTTSNet模型获得的分割结果示例,并与几种先进方法以及两种基于TTTSNet的配置进行了比较。真实值缩写为GT。TTTSNet★表示使用FetReg2021挑战赛提供的原始逐像素标注进行训练的TTTSNet。这些图像按整体评分的最佳顺序排列,最佳结果位于左侧。每行对应一个不同的视频,每列显示输入图像、真实值、TTTSNet的结果以及其他先进方法的结果。
Fig. 12. Examples of poor visibility in video frames and their corresponding overlay prediction mask of placental vessels. The first row shows input video frames, and the secondrow shows the overlay prediction mask of placental vessels. Here, we demonstrate how a deep learning-based model may improve the visibility of placental vessels to assist fetalsurgeons during TTTS fetoscopic surgery
图 12. 视频帧中可见度较差的示例及其对应的胎盘血管预测覆盖掩码。第一行显示了输入视频帧,第二行显示了胎盘血管的预测覆盖掩码。在这里,我们展示了深度学习模型如何改善胎盘血管的可见度,以帮助胎儿外科医生在TTTS胎儿镜手术中进行操作。
Fig. 13. Examples of video frames from two types of the placenta: (a) anterior, and (b) posterior placenta. We demonstrate that anterior placenta cases exhibit better visibility of placental vessels within the field of view compared to posterior cases, which impacts the segmentation performance of both types.
图 13. 来自两种类型胎盘的视频帧示例:(a)前置胎盘,(b)后置胎盘。我们展示了与后置胎盘相比,前置胎盘病例在视野内表现出更好的胎盘血管可见度,这对两种类型的分割性能产生了影响。
Table
表
Table 1The total number of videos and video frames from each of the six centers usedfor training and testing.
表 1 显示了用于训练和测试的来自六个中心的视频和视频帧总数。
Table 2Experimental results of ablation study with different approaches to each of keycomponents of TTTSNet. The results of the test set are presented. The first row isthe result of the baseline neural network as a part of TTTSNet, and the rest three rowsrefer to additional components added to the baseline.
表 2 不同方法对TTTSNet各关键组件的消融研究实验结果。展示了测试集的结果。第一行为TTTSNet中基线神经网络的结果,后面三行则对应在基线基础上添加的额外组件的结果。
Table 3Experimental results of ablation study with different approaches to custom data augmentation methods used for TTTSNet training. The results of the test set are presented. We listed five different approaches. The first row wasthe result of TTTSNet trained without any custom data augmentations as the baseline, and the other four rows refer to progressively adding each type of data augmentation.
表 3展示了对TTTSNet训练中使用的自定义数据增强方法进行的消融研究实验结果。展示了测试集的结果。我们列出了五种不同的方法。第一行为未使用任何自定义数据增强进行训练的TTTSNet结果,作为基线;其余四行为逐步添加每种类型数据增强后的结果。
Table 4A summary of the number of parameters in millions, inference speed on both A100 GPU (GPU) and Clara AGX (Clara) hardware in FPS, and values of mIoU (%) for placental vessel segmentation obtained with different state-of-the-art methods computed using the test set. Each column shows the method, results per Center, as well as overall results. All methods were compared with the same image size of 448 × 448 pixels. The 𝑝-value indicates the pairwise comparison of the significance between TTTSNet and each method. The results are in order of the segmentation performance. The best results are bolded.
表 4总结了不同先进方法在测试集中获得的参数数量(以百万计)、在A100 GPU(GPU)和Clara AGX(Clara)硬件上的推理速度(FPS)以及胎盘血管分割的mIoU(平均交并比)百分比值。每列显示了方法、各中心的结果以及总体结果。所有方法均在相同的图像尺寸(448 × 448像素)下进行比较。𝑝值表示TTTSNet与每种方法之间的配对显著性比较。结果按分割性能排序,最佳结果以粗体标出。
Table 5A summary of values of mIoU (%) for tiny placental vessel segmentation obtained with different state-of-the-art methodscomputed using the test set. Each column shows the method, results per Center, as well as overall results. All methods werecompared with the same image size of 448 × 448 pixels. The 𝑝-value indicates the pairwise comparison of the significance between TTTSNet and each method. The results are in order of the segmentation performance. The best results are bolded
表5总结了使用不同先进方法在测试集中获得的微小胎盘血管分割的mIoU(平均交并比)百分比值。每列显示了方法、各中心的结果以及总体结果。所有方法均在相同的图像尺寸(448 × 448像素)下进行比较。𝑝值表示TTTSNet与每种方法之间的配对比较的显著性。结果按分割性能排序,最佳结果以粗体标出。
Table 6A summary of values of mIoU (%) ± standard deviation for TTTSNet and the next top 5 performing methods per each video from the test set. Each column shows the videoname, Center, type of placenta and method. All methods were compared with the same image size of 448 × 448 pixels and the same training settings, i.e. data augmentations.The results are in order of the segmentation performance (the best on the left). The best results are bolded.
表 6总结了TTTSNet及测试集中排名前5的方法每个视频的mIoU(平均交并比)百分比值±标准差。每列显示了视频名称、中心、胎盘类型和方法。所有方法均使用相同的图像尺寸(448 × 448像素)和相同的训练设置(即数据增强)进行比较。结果按分割性能排序(最好的结果在左侧)。最佳结果以粗体标出。
Table 7Results of six-fold cross-validation for both baseline and TTTSNet methods on original and corrected annotations. Vessel Original and Vessel Corrected classesare abbreviated as VO and VC, respectively.
表 7基线方法和TTTSNet方法在原始和修正标注上的六折交叉验证结果。血管原始类和血管修正类分别缩写为VO和VC。
Table 8A summary of the values of mIoU (%) for TTTSNet and the top 5 performing methods from the FetReg2021 challenge. We provide placental vessel segmentation performancefor each Center, as well as overall results for both original and corrected annotations. All methods were compared using the same image size of 448 × 448 pixels and the sametraining settings, i.e., data augmentations. The results are ordered by segmentation performance, with the best results bolded.
表 8总结了TTTSNet与FetReg2021挑战赛中排名前5的方法的mIoU(平均交并比)百分比值。我们提供了每个中心的胎盘血管分割性能,以及原始和修正标注的总体结果。所有方法均使用相同的图像尺寸(448 × 448像素)和相同的训练设置,即数据增强。结果按分割性能排序,最佳结果以粗体标出。