Title
题目
Label refinement network from synthetic error augmentation for medicalimage segmentation
基于合成错误增强的标签精细化网络用于医学图像分割
01
文献速递介绍
卷积神经网络(CNN)是许多生物医学影像分割任务的最先进技术。许多CNN分割架构已经被提出,例如全连接网络(Long等,2015)、Dense-Net(Huang等,2017)和U-Net(Ronneberger等,2015)。由于其高效的结构设计和跳跃连接,U-Net已成为生物医学图像分割中最流行的网络,并在各种分割任务中表现出卓越的准确性和鲁棒性(Isensee等,2021;Siddique等,2021)。大多数基于CNN的分割方法,包括U-Net,并未充分利用和编码待分割对象的结构信息。因此,这些方法可能产生在观察整个分割结构时明显可见的错误。例如,在分割肺部气道等拉长的管状结构时,可能会出现不连续性错误,如图1所示。利用标签结构知识,例如气道树分支的连续性,可以帮助防止这些错误。然而,在CNN中显式编码这些全局信息并不简单。
本文提出了一种框架,通过将其形式化为标签精细化步骤,隐式地将标签结构信息编码到CNN中。具体而言,我们在分割结果(真值或基准)中生成结构性合成错误,并训练一个标签精细化网络来修正这些错误。训练后的网络预计能够推广到基准分割网络产生的初始分割中的真实错误,并进行修正。为了增强标签精细化网络在初始分割上的泛化能力,采用了标签外观仿真网络,减少了含有合成错误的分割图像与初始分割图像之间的外观差异。通过这些具有外观增强合成错误的分割图像,或初始分割图像,与原始图像一起作为输入,并以真值分割图像作为参考,标签精细化网络可以学习修正这些错误,并将其纳入分割决策中。
我们在两个分割任务上验证了所提出的标签精细化方法:从胸部计算机断层扫描(CT)图像中进行气道分割(Garcia-Uceda等,2021),以及从脑部三维CT血管造影(CTA)图像中进行脑血管分割(Su等,2020)。我们将我们的方法与U-Net基准、四种其他标签精细化方法进行比较:DoubleU-Net(Jha等,2020)、SCAN(Dai等,2018)、Post-DAE(Larrazabal等,2020)和DVAE(Araújo等,2019);以及使用clDice损失函数训练的U-Net(Shit等,2021)。此外,我们进行了消融实验,展示了标签精细化方法中每个组成部分的贡献。最后,我们在半监督设置下进行了实验,使用额外的未标注数据训练我们的方法。
Aastract
摘要
Deep convolutional neural networks for image segmentation do not learn the label structure explicitly andmay produce segmentations with an incorrect structure, e.g., with disconnected cylindrical structures in thesegmentation of tree-like structures such as airways or blood vessels. In this paper, we propose a novel labelrefinement method to correct such errors from an initial segmentation, implicitly incorporating informationabout label structure. This method features two novel parts: (1) a model that generates synthetic structuralerrors, and (2) a label appearance simulation network that produces segmentations with synthetic errors thatare similar in appearance to the real initial segmentations. Using these sementations with synthetic errorsand the original images, the label refinement network is trained to correct errors and improve the initialsegmentations. The proposed method is validated on two segmentation tasks: airway segmentation from chestcomputed tomography (CT) scans and brain vessel segmentation from 3D CT angiography (CTA) images of thebrain. In both applications, our method significantly outperformed a standard 3D U-Net, four previous labelrefinement methods, and a U-Net trained with a loss tailored for tubular structures. Improvements are evenlarger when additional unlabeled data is used for model training. In an ablation study, we demonstrate thevalue of the different components of the proposed method.
深度卷积神经网络在图像分割中并未显式学习标签结构,因此可能产生结构不正确的分割结果,例如,在树状结构(如气道或血管)的分割中出现断开的圆柱形结构。本文提出了一种新的标签精细化方法,用于修正初始分割中的此类错误,隐式地融合了标签结构的信息。该方法包含两个创新部分:(1) 生成合成结构性错误的模型,(2) 标签外观仿真网络,用于生成与真实初始分割相似的合成错误分割图像。通过使用这些含有合成错误的分割图像和原始图像,标签精细化网络被训练以修正错误并改进初始分割结果。所提出的方法在两个分割任务上进行了验证:从胸部计算机断层扫描(CT)图像中进行气道分割,以及从脑部三维CT血管造影(CTA)图像中进行脑血管分割。在这两个应用中,我们的方法显著优于标准3D U-Net、四种先前的标签精细化方法和经过针对管状结构损失函数训练的U-Net。当使用额外的未标注数据进行模型训练时,改进效果更为显著。在消融实验中,我们展示了所提方法中不同组成部分的价值。
Method
方法
The proposed method consists of four steps, schematically shownin Fig. 2. Firstly, a baseline segmentation network generates the initialsegmentations. Secondly, synthetic errors are generatedand added to every ground truth segmentation, to create segmentations with synthetic errors to train the label refinement network. Thirdly, a label appearance simulation network (LASN)based on adversarial learning is used to reduce the appearance difference between the segmentations with synthetic errors and the initialsegmentations . Steps 2–3 constitute a realistic data augmentation (error augmentation) technique to generate training samplesfor the label refinement network, with a much larger variety of errorsthan in the initial segmentations. Finally, a label refinement networkis trained to predict the final refined segmentations, using the segmentations with appearance-enhanced synthetic errors or the initialsegmentations, together with the original images, as inputs and theground truth segmentations as reference.
所提出的方法包含四个步骤,如图2所示。首先,基准分割网络生成初始分割结果。其次,生成合成错误并将其添加到每个真值分割图像中,以创建含有合成错误的分割图像,用于训练标签精细化网络。第三,基于对抗学习的标签外观仿真网络(LASN)被用来减少含有合成错误的分割图像与初始分割图像之间的外观差异。步骤2-3构成了一种真实的数据增强(错误增强)技术,用于生成标签精细化网络的训练样本,其包含的错误种类比初始分割结果更为丰富。最后,标签精细化网络被训练来预测最终的精细化分割结果,使用外观增强合成错误的分割图像或初始分割图像,结合原始图像作为输入,真值分割图像作为参考。
Conclusion
结论
We presented a novel label refinement method that can learn fromsynthetic errors to refine the initial segmentations from a base segmentation network. A label appearance simulation network was appliedto reduce the appearance difference between the segmentations withsynthetic errors and the real initial segmentations, thereby improvingthe generalizability of our method. On two segmentation tasks forbranching structures, the proposed method achieved significantly bettersegmentation results when compared to four previous label refinementmethods, and a U-Net trained with a loss tailored for tubular structures.The segmentation performance of our method was further improvedby using additional unlabeled data for training with semi-supervisedlearning techniques.
我们提出了一种新颖的标签精细化方法,该方法可以通过合成错误学习来精细化基准分割网络的初始分割结果。应用标签外观仿真网络减少了带有合成错误的分割图像与真实初始分割图像之间的外观差异,从而提高了我们方法的泛化能力。在两个分支结构的分割任务中,所提方法在分割结果上显著优于四种先前的标签精细化方法,以及针对管状结构定制损失函数训练的U-Net。通过使用额外的未标注数据进行半监督学习训练,我们的方法的分割性能得到了进一步提升。
Results
结果
The results of our experiments for airway and brain vessel segmentation are shown in Tables 1 and 2, respectively. In both applications,the proposed label refinement method achieves the highest Dice andcompleteness scores, the lowest number of gaps, with a moderateleakage compared to the other methods. This indicates that our methodlearns from the synthesized errors and succeeds in correcting errorsin the real data. In both applications, the methods with the highestcompleteness (U-Net trained with clDice loss for airways, and DoubleUNet for vessels) show both more leakage and more gaps than ourmethod. This indicates that these methods may lack the ability to learnrelevant label structural information, and over-segment branches toincrease the completeness rather than correcting errors in continuity.For airway segmentation, DVAE shows a similar number of gaps to theproposed method, while suffering from lower Dice and completeness.For vessel segmentation, both Post-DAE and DVAE show lower Dice andcompleteness than the proposed method, with a moderate improvementin the number of gaps compared to the U-Net baseline. This indicatesthat the autoencoder-based methods suffer from under-segmentation inboth applications.In the ablation study, the label refinement method with syntheticerrors (LR+Syn) achieves better Dice, leakage, and number of gapsscores than the baseline refinement network (LR), for both applications.For airway segmentation, the (LR+Syn) method has slightly lowercompleteness, while this is similar for vessel segmentation. Moreover,adding synthetic errors to the initial segmentations (LR+Syn(init)),in contrast to doing so to the ground truth segmentations (LR+Syn),achieves similar results in all metrics when compared to the baseline U-Net, for both applications. This suggests that the initial segmentations are too incomplete to add sufficient useful synthetic errors to train the refinement network. The proposed method, combining the synthetic errors and the label appearance simulation network(LR+Syn+LASN), achieves a much higher completeness, with similarDice, leakage and number of gaps scores when compared to the methodwith only synthetic errors (LR+Syn), for both applications.
5.1. 分割结果
我们在气道和脑血管分割实验中的结果分别展示在表 1 和表 2 中。在这两个应用中,所提的标签精细化方法达到了最高的Dice系数和完整度分数,并且相较于其他方法具有最低的间隙数和适度的泄漏量。这表明我们的方法能够从合成错误中学习,并成功地纠正了真实数据中的错误。在这两个应用中,具有最高完整度的方法(针对气道的U-Net训练并使用clDice损失,和针对血管的DoubleU-Net)表现出了比我们的方法更多的泄漏和间隙。这表明这些方法可能缺乏学习相关标签结构信息的能力,导致过度分割分支以提高完整度,而不是纠正连续性中的错误。
对于气道分割,DVAE在间隙数量上与所提方法相似,但Dice系数和完整度较低。对于血管分割,Post-DAE和DVAE的Dice系数和完整度都低于所提方法,但与U-Net基准相比,间隙数量有所改善。这表明基于自编码器的方法在这两个应用中都出现了欠分割的问题。
在消融研究中,带有合成错误的标签精细化方法(LR+Syn)在两个应用中都表现出比基准精细化网络(LR)更好的Dice系数、泄漏量和间隙数分数。对于气道分割,(LR+Syn)方法的完整度略低,而对于血管分割则相似。此外,向初始分割结果添加合成错误(LR+Syn(init)),与向真值分割结果添加合成错误(LR+Syn)相比,在两个应用中与基准U-Net的所有指标相比结果相似。这表明,初始分割结果过于不完整,无法为训练精细化网络提供足够有用的合成错误。所提方法,结合合成错误和标签外观仿真网络(LR+Syn+LASN),在两个应用中都比仅使用合成错误(LR+Syn)的方法具有更高的完整度,并且Dice系数、泄漏量和间隙数与仅使用合成错误的方法相似。
Figure
图
Fig. 1. Common structural errors in the segmentations obtained by a U-Net, trainedto segment airways (Garcia-Uceda et al., 2021). True positives are displayed in yellow,false negatives in blue and false positives in red. Detailed views a–b show errors asmissing terminal branches, and view c shows a discontinuity error in the branch.
图 1. U-Net分割气道时获得的常见结构性错误(Garcia-Uceda等,2021)。真阳性显示为黄色,假阴性为蓝色,假阳性为红色。详细视图a-b显示为缺失的末端分支,视图c显示为分支中的不连续性错误。
Fig. 2. Schematics of the proposed label refinement method. First, a base segmentation network 𝑓1 is trained to obtain the initial segmentations 𝑥. Second, we create segmentationswith synthetic errors 𝑥𝑠 that are similar to the errors in 𝑥. Third, a label appearance improvement network 𝑓a (together with a discriminator 𝐷) is trained to obtain segmentationswith appearance-enhanced synthetic errors ̂𝑥𝑠 . Finally, the label refinement network 𝑓2 is trained to correct these synthetic errors, with either ̂𝑥𝑠 or 𝑥 together with the image 𝐼as inputs.
图 2. 所提出的标签精细化方法的示意图。首先,训练一个基准分割网络 𝑓1 来获得初始分割结果 𝑥。其次,我们创建含有合成错误的分割图像 𝑥𝑠,这些错误与 𝑥 中的错误相似。第三,训练一个标签外观改进网络 𝑓a(与判别器 𝐷 一起),以获得外观增强合成错误的分割图像 ̂𝑥𝑠。最后,训练标签精细化网络 𝑓2 来修正这些合成错误,使用 ̂𝑥𝑠 或 𝑥 以及图像 𝐼 作为输入。
Fig. 3. Schematics of the synthetic segmentation errors defined for airways. Definitionsare shown for a randomly selected terminal branch (left) and non-terminal branch(right). 𝑏𝑂: branch start point, 𝑏𝐸 : branch end point, 𝑏𝑀 : branch middle point, 𝑚𝑂:mask start point, 𝐿: mask length. The masked section of airway branches is displayedin blue (for the selected one as well as other nearby branches).
图 3. 气道合成分割错误的示意图。左侧显示的是随机选择的末端分支的定义,右侧为非末端分支的定义。𝑏𝑂:分支起点,𝑏𝐸:分支终点,𝑏𝑀:分支中点,𝑚𝑂:掩模起点,𝐿:掩模长度。气道分支的掩模部分以蓝色显示(包括选定分支及其他附近的分支)。
Fig. 4. Example of segmentation of airways obtained by the different components ofthe proposed method. In the detailed views, true positives are displayed in yellow, falsenegatives in blue and false positives in red.
图 4. 所提出方法不同组件获得的气道分割示例。在详细视图中,真阳性显示为黄色,假阴性为蓝色,假阳性为红色。
Fig. 5. Influence of the hyperparameters of the proposed method, the maximum synthetic error rates, in the method performance, for airway and brain vessel segmentation.Results are shown as average performance with standard deviation (error bars), for Dice and completeness metrics, over three random data splits. The results for the baseline (LR)are displayed as dashed line.
图 5. 所提方法的超参数——最大合成错误率对气道和脑血管分割性能的影响。结果以Dice系数和完整度指标的平均性能及标准差(误差条)表示,基于三次随机数据划分。基准方法(LR)的结果以虚线显示。
Table
表
Table 1Results for airway segmentation. Average performance (standard deviation) over the results obtained from three random data splits. LR: simplelabel refinement network. LR+Syn(init): label refinement method with synthetic errors on initial segmentations. LR+Syn: label refinement methodwith synthetic errors on ground truth segmentations. LR+Syn+LASN: label refinement method with label appearance simulation network. ↑:significantly better than the U-Net baseline (𝑝 < 0.05). ↓: significantly worse than the U-Net baseline (𝑝 < 0.05). P-values are calculated by thepaired two-sided Student’s T-test (on the average results from the three data splits). Boldface: best results, or not significantly different fromthe best results.
表 1气道分割结果。三次随机数据划分获得的结果的平均性能(标准差)。LR:简单标签精细化网络。LR+Syn(init):对初始分割结果应用合成错误的标签精细化方法。LR+Syn:对真值分割结果应用合成错误的标签精细化方法。LR+Syn+LASN:带有标签外观仿真网络的标签精细化方法。↑:显著优于U-Net基准(𝑝 < 0.05)。↓:显著差于U-Net基准(*𝑝 <* 0.05)。P值通过配对双侧Student’s T检验计算(基于三次数据划分的平均结果)。粗体字:最佳结果或与最佳结果没有显著差异。
Table 2Results for brain vessel segmentation. Average performance (standard deviation) over the results obtained from three random data splits. LR:simple label refinement network. LR+Syn(init): label refinement method with synthetic errors on initial segmentations. LR+Syn: label refinementmethod with synthetic errors on ground truth segmentations. LR+Syn+LASN: label refinement method with label appearance simulation network.↑: significantly better than the U-Net baseline (𝑝 < 0.05). ↓: significantly worse than the U-Net baseline (𝑝 < 0.05). P-values are calculated by thepaired two-sided Student’s T-test (on the average results from the three data splits). Boldface: best results, or not significantly different fromthe best results.
表 2脑血管分割结果。三次随机数据划分获得的结果的平均性能(标准差)。LR:简单标签精细化网络。LR+Syn(init):对初始分割结果应用合成错误的标签精细化方法。LR+Syn:对真值分割结果应用合成错误的标签精细化方法。LR+Syn+LASN:带有标签外观仿真网络的标签精细化方法。↑:显著优于U-Net基准(𝑝 < 0.05)。↓:显著差于U-Net基准(*𝑝 <* 0.05)。P值通过配对双侧Student’s T检验计算(基于三次数据划分的平均结果)。粗体字:最佳结果或与最佳结果没有显著差异。
Table 3Results with semi-supervised learning for airway segmentation. Average performance (standard deviation) over theresults obtained from three random data splits. LR+Syn+LASN: proposed method trained only with labeled data.LR+Syn+LASN+Unlabeled: proposed method trained with both labeled and unlabeled data. Boldface: significantly better thanthe supervised results (𝑝 < 0.05). P-values are calculated by the paired two-sided Student’s T-test (on the average results fromthe three data splits)
表 3气道分割的半监督学习结果。三次随机数据划分获得的结果的平均性能(标准差)。LR+Syn+LASN:仅使用标注数据训练的所提方法。LR+Syn+LASN+Unlabeled:同时使用标注数据和未标注数据训练的所提方法。粗体字:显著优于监督学习结果(𝑝 < 0.05)。P值通过配对双侧Student’s T检验计算(基于三次数据划分的平均结果)。
Table 4Results with semi-supervised learning for brain vessel segmentation. Average performance (standard deviation) overthe results obtained from three random data splits. LR+Syn+LASN: proposed method trained only with labeled data.LR+Syn+LASN+Unlabeled: proposed method trained with both labeled and unlabeled data. Boldface: significantly better thanthe supervised results (𝑝 < 0.05). P-values are calculated by the paired two-sided Student’s T-test (on the average results fromthe three data splits).
表 4脑血管分割的半监督学习结果。三次随机数据划分获得的结果的平均性能(标准差)。LR+Syn+LASN:仅使用标注数据训练的所提方法。LR+Syn+LASN+Unlabeled:同时使用标注数据和未标注数据训练的所提方法。粗体字:显著优于监督学习结果(𝑝 < 0.05)。P值通过配对双侧Student’s T检验计算(基于三次数据划分的平均结果)。