• 知识分享
  • DDoCT:形态保持的双域联合优化用于快速稀疏视角低剂量CT成像|文献速递-医学影像人工智能进展

Title

题目

DDoCT: Morphology preserved dual-domain joint optimization for fast sparse-view low-dose CT imaging

  DDoCT:形态保持的双域联合优化用于快速稀疏视角低剂量CT成像 

01

文献速递介绍

计算机断层扫描(CT)是当今广泛应用的医学影像技术。作为一种非侵入性的诊断工具,CT为临床应用提供了重要的解剖信息,在现代医学的进步中发挥着至关重要的作用。其广泛的临床应用涵盖多个重要领域。在临床诊断中,CT在识别各种疾病的特征(如肿瘤和中风)方面表现出强大的能力(Ginat 和 Gupta, 2014)。在临床治疗中,特别是在肿瘤放射治疗领域,CT图像可用于精确定位肿瘤并引导放射束,从而确保精确的治疗规划与调整,以获得最佳治疗效果。此外,CT能够提供高分辨率的内部解剖图像,为外科医生提供术前规划的指导。在临床实践中,为了获取具有诊断价值的CT图像,通常需要遵循“尽可能低的合理辐射剂量”(ALARA)原则。尽管CT扫描(全球范围内共4.03亿次)仅占放射与核医学检查或相关操作年总数的9.6%,但其对总人口的年集体有效辐射剂量贡献却高达62%(Mahesh et al., 2022)。某些类型的CT检查(如对比剂增强和非增强成像)可能需要患者接受多次扫描,这可能导致过量辐射(Kachelrieß 和 Rehani, 2020),进而增加患癌症及遗传损伤的风险。此外,过量辐射对患者可能产生不可逆的影响,并降低其生活质量(Goldman, 2007)。因此,这一问题引起了广泛的公众关注(Brenner 和 Hall, 2007;Pearce 等, 2012)。

低剂量CT(LDCT)是一种有效的降低患者辐射剂量的措施(Team, 2011)。在临床实践中,获取LDCT图像的常见方法是降低管电流,以减少产生X射线光子的电子束通量,和/或在CT检查过程中减少投影数量。尽管这些措施在一定程度上缓解了辐射剂量过高的问题,但它们可能降低X射线的信噪比和图像质量,从而降低CT图像的对比度分辨率,并可能增加噪声与伪影。如果这些问题得不到有效解决,将严重影响LDCT在各种临床场景中的应用。因此,如何利用低剂量扫描协议获取质量可与常规剂量CT(NDCT)相媲美并满足临床需求的高质量CT图像,已成为CT影像学领域长期存在且具有实际重要性的问题。

为了解决这一问题,研究人员提出了多种LDCT成像算法,大致可分为以下三类(Zhang et al., 2021a;Yang, 2017;Zhang et al., 2013):基于正弦图(sinogram)的预处理方法、迭代重建方法以及图像后处理方法。

正弦图预处理方法 是针对低剂量X射线束获取的CT原始数据进行处理。由于此类方法直接作用于原始数据采集阶段,因此能够有效结合物理特性和光子统计特性。典型方法包括双边滤波(Manduca et al., 2009)、结构自适应滤波(Balda et al., 2012)和加权最小二乘惩罚(Wang et al., 2006)。这些去噪方法能够结合物理特性与光子统计特性,因此在正弦图预处理中受到广泛关注。然而,正弦图预处理也存在一些局限性。首先,该方法对高质量原始数据依赖较大。例如,在低剂量投影数据情况下,随着信号减弱,重建结果的质量将显著下降,难以有效恢复信号中丢失的信息,从而影响整体性能。其次,正弦图预处理可能会导致空间分辨率下降。

迭代重建方法 主要通过迭代优化目标函数来重建图像。这种方法通常在投影域和图像域之间交替进行前向投影和后向投影,并基于收敛准则最小化目标函数。该目标函数通常通过整合已知物体的信息,以保持边缘结构并降低噪声(Xu et al., 2012;Zhang et al., 2016;Bian et al., 2013;Chen et al., 2014)。然而,此类方法通常需要访问原始数据,并且计算成本较高。此外,图像重建质量高度依赖于目标函数的精确设定及超参数调节。这些限制因素阻碍了迭代重建方法在临床实践中的应用。

图像后处理方法 主要关注图像域的噪声和伪影去除。其主要优势在于不依赖CT设备提供的原始投影数据,因此能够更容易地集成至临床CT成像工作流程中。典型方法包括非局部均值滤波算法(Li et al., 2014)、块匹配算法(Kang et al., 2013)和低秩稀疏编码(Lei et al., 2018)。然而,基于图像域的算法在去噪过程中通常依赖于特定的噪声模型来估计噪声分布,这可能会影响其整体性能。

Abatract

摘要

Computed tomography (CT) is continuously becoming a valuable diagnostic technique in clinical practice.However, the radiation dose exposure in the CT scanning process is a public health concern. Within medicaldiagnoses, mitigating the radiation risk to patients can be achieved by reducing the radiation dose throughadjustments in tube current and/or the number of projections. Nevertheless, dose reduction introducesadditional noise and artifacts, which have extremely detrimental effects on clinical diagnosis and subsequentanalysis. In recent years, the feasibility of applying deep learning methods to low-dose CT (LDCT) imaging hasbeen demonstrated, leading to significant achievements. This article proposes a dual-domain joint optimizationLDCT imaging framework (termed DDoCT) which uses noisy sparse-view projection to reconstruct highperformance CT images with joint optimization in projection and image domains. The proposed method notonly addresses the noise introduced by reducing tube current, but also pays special attention to issues suchas streak artifacts caused by a reduction in the number of projections, enhancing the applicability of DDoCTin practical fast LDCT imaging environments. Experimental results have demonstrated that DDoCT has madesignificant progress in reducing noise and streak artifacts and enhancing the contrast and clarity of the images.

计算机断层扫描(CT)在临床实践中正逐步成为一种重要的诊断技术。然而,在CT扫描过程中,辐射剂量的暴露已成为一个公共健康问题。在医学诊断中,可以通过调整管电流和/或投影数量来降低辐射剂量,从而减少患者的辐射风险。然而,剂量的降低会引入额外的噪声和伪影,对临床诊断及后续分析产生极为不利的影响。近年来,深度学习方法在低剂量CT(LDCT)成像中的应用可行性已得到验证,并取得了显著进展。本文提出了一种双域联合优化的LDCT成像框架(称为DDoCT),该方法利用具有噪声的稀疏视角投影,通过投影域和图像域的联合优化,重建高质量的CT图像。该方法不仅能够有效应对因管电流降低而引入的噪声问题,同时也特别关注因投影数量减少而导致的条纹伪影问题,从而提高DDoCT在实际快速LDCT成像环境中的适用性。实验结果表明,DDoCT在降低噪声和条纹伪影、增强图像对比度和清晰度方面取得了显著进展。

Method

方法

3.1. Framework overview

In this paper, we propose DDoCT, a novel deep-learning framework for dual-domain joint optimization designed for noisy sparse-viewLDCT imaging. It provides a new definition for noisy sparse-view LDCTimaging by integrating precisely designed projection domain superresolution with image domain wavelet high-frequency feature fusiontechnology. Fig. 1 illustrates this framework. The framework is dividedinto two key processing stages: projection domain processing and imagedomain processing. By combining the RFDN-sino and WHF-DN networks, DDoCT provides innovative solutions in practical LDCT imagingenvironments.During the projection domain stage, the network takes noisy sparseview sinograms 𝑆**𝑖 as input. The RFDN-sino network is employed topredict a low-noise dense sinograms 𝑆**𝑝 from the input. Subsequently,the classical FBP algorithm is used to map the sinogram from the projection domain to the image domain, obtaining the reconstructed image 𝐼**𝑖 .The design of this stage mainly aims to address the streak artifacts andpseudo-structures introduced by the incomplete projections, therebyimproving the quality of 𝐼**𝑖 and simultaneously reducing noise levels.During the image domain processing stage, in order to address thesecondary artifacts and the residual noise, the 𝐼**𝑖 is passed through theWHF-DN in the image domain and ultimately aims to generate a highquality image 𝐼**𝑝 that can be as consistent as possible with the NDCTimages.Through this design, the entire framework is able to comprehensively handle various noises and artifacts introduced in the practical LDCT imaging environment, yielding more accurate and highperformance CT images.

3.1. 框架概述

在本文中,我们提出了一种用于具有噪声的稀疏视角低剂量 CT(LDCT)成像的双域联合优化深度学习框架——DDoCT。该框架通过将精心设计的投影域超分辨率技术与图像域的小波高频特征融合技术相结合,为具有噪声的稀疏视角 LDCT 成像提供了一种新的定义。图 1 直观展示了该框架的结构。整个框架分为两个关键处理阶段:投影域处理 和 图像域处理。通过结合 RFDN-sino 和 WHF-DN 网络,DDoCT 在实际 LDCT 成像环境中提供了创新性的解决方案。

在 投影域阶段,网络以带噪声的稀疏视角正弦图 (𝑆𝑖) 作为输入。RFDN-sino 网络用于从输入中预测低噪声的稠密正弦图 (𝑆𝑝)。随后,经典的 滤波反投影(FBP, Filtered Back Projection) 算法用于将正弦图从投影域映射到图像域,从而得到初步重建的图像 𝐼𝑖。本阶段的设计主要目的是解决因投影不完整而引入的条纹伪影和伪结构,提高 𝐼𝑖 的质量,并同时降低噪声水平。

在 图像域处理阶段,为了进一步去除次级伪影和残留噪声,初步重建的 𝐼𝑖 经过 WHF-DN 网络 处理,最终生成高质量的 CT 图像 𝐼𝑝,其质量尽可能接近常规剂量 CT(NDCT)图像。

通过这一设计,整个框架能够全面处理实际 LDCT 成像环境中引入的各种噪声和伪影,从而生成更准确、高性能的 CT 图像。

Conclusion

结论

This paper proposes a dual-domain joint optimization framework,DDoCT, specifically designed for noisy sparse-view LDCT imaging.DDoCT leverages the capabilities provided by deep neural networks forprojection domain super-resolution and image domain wavelet highfrequency fusion to learn dual-domain mapping relationships, offering anew definition for LDCT imaging. In the projection domain, RFDN-sinolearns the transformation mapping from noisy sparse-view sinogramsto low-noise dense sinograms, providing clearer and less noisy inputimages for the image domain stage. In the image domain, WHF-DNis responsible for addressing secondary artifacts and residual noise.Particularly, by learning the fusion relationships of high-frequencyfeatures in the wavelet domain, the WHF-DN further refines imagedetails and improves the overall quality of the images.The design of DDoCT not only focuses on addressing the noiseintroduced by reducing tube current, but also solves issues such asstreak artifacts caused by a reduction in the number of projections. Thiscomprehensive design enhances the applicability of DDoCT in practicalLDCT imaging environments. Experimental results have demonstratedthat our proposed DDoCT can comprehensively handle various noisesand artifacts in practical LDCT imaging environments, producing moreaccurate and high-performance CT images. It has the potential to beapplied in clinical workflows.However, there are also some limitations in this study. The performance of the trained model may be influenced by the complexityof artifact patterns in certain scenarios. For instance, artifacts arisingfrom extremely sparse projection sampling, metal artifacts due to theimplantation of dense objects, and truncation artifacts resulting froman inadequate field of view all pose significant challenges. While all ofthese cases lead to degradation in model performance, they are alsochallenging scenarios for classical image reconstruction algorithms.To address these issues, future work will focus on optimizing themodel design further and introducing physics constraints to improve itsrobustness and adaptability to such edge cases. Additionally, we planto explore techniques for enhancing inference efficiency to enable moreeffective application of DDoCT in diverse clinical settings.

本文提出了一种专为噪声稀疏视角低剂量CT(LDCT)成像设计的双域联合优化框架——DDoCT。DDoCT 利用深度神经网络提供的投影域超分辨率和图像域小波高频融合能力,学习双域映射关系,为 LDCT 成像提供了新的定义。在投影域,RFDN-sino 学习从噪声稀疏视角正弦图到低噪声密集正弦图的变换映射,为图像域阶段提供更清晰、噪声更少的输入图像。在图像域,WHF-DN 负责解决次级伪影和残余噪声。特别是通过学习小波域中高频特征的融合关系,WHF-DN 进一步细化图像细节并提高图像的整体质量。

DDoCT 的设计不仅专注于解决减少管电流所引入的噪声,还解决了由于减少投影数量而产生的条纹伪影等问题。这一综合设计增强了 DDoCT 在实际 LDCT 成像环境中的适用性。实验结果表明,所提出的 DDoCT 能够全面处理实际 LDCT 成像环境中的各种噪声和伪影,生成更准确、高性能的 CT 图像,具有在临床工作流中应用的潜力。

然而,本研究也存在一些局限性。训练模型的性能可能会受到某些场景中伪影模式复杂性的影响。例如,极端稀疏投影采样引起的伪影、由于密集物体植入引起的金属伪影以及由于视野不足导致的截断伪影都对模型性能构成了显著挑战。尽管这些情况会导致模型性能下降,但它们也是经典图像重建算法面临的挑战性场景。为了解决这些问题,未来的工作将专注于进一步优化模型设计并引入物理约束,以提高其在这些边缘情况下的鲁棒性和适应性。此外,我们还计划探索提高推理效率的技术,以便在不同临床环境中更有效地应用 DDoCT。

Results

结果

4.1. Dataset

In this work, we used a publicly released patient dataset for the2016 NIH-AAPM-Mayo Clinic Low-Dose CT Grand Challenge. Thedataset contains abdominal NDCT images from 10 patients with a slicethickness of 1 mm. The quarter-dose LDCT images were generatedby inserting Poisson noise into the projection data to mimic a noiselevel that corresponded to 25% of the full dose. In this study, wefurther generate noisy sparse-view sinograms and low-noise dense viewsinograms by using the Mayo dataset. Specifically, by performing 240parallel-beam projections around 360 degrees on the original LDCT images, we simulated noisy sparse-view sinograms using this sparsesampling approach, denoted as 𝑆𝑖 . By performing 720 parallel-beamprojections around 360 degrees on the original NDCT images, wesimulated low-noise dense sinograms, denoted as 𝑆**𝐺𝑇 . We used datafrom 9 patients as the training dataset, resulting in a total of 5410 pairs,and data from the remaining 1 patient as the test dataset, resulting ina total of 526 pairs.We also used in-house real low- and high-dose head phantom datafor experimental validation. Specifically, an anthropomorphic headphantom was scanned using a cone beam CT on-board imager (TrueBeam System, Varian Medical Systems, Palo Alto, CA). The phantomdata comprises pairs of CT images acquired using low-dose parameters (80 kV, 100 mA) and high-dose parameters (80 kV, 400 mA),with the low-dose settings representing one-quarter of the radiationexposure of the high-dose configurations. We applied the same preprocessing method used for the Mayo dataset to the phantom data,generating corresponding noisy sparse-view sinograms. The data directly reflect the noise and artifact characteristics under real low- andnormal dose conditions in daily CBCT for patient setup in image-guidedradiotherapy.

4.1. 数据集

在本研究中,我们使用了 2016 NIH-AAPM-Mayo Clinic 低剂量 CT 挑战赛(Low-Dose CT Grand Challenge) 公开发布的患者数据集。该数据集包含 10 名患者的腹部常规剂量 CT(NDCT)图像,图像的层厚为 1 mm。四分之一剂量的低剂量 CT(LDCT)图像通过在投影数据中加入泊松噪声(Poisson noise)生成,以模拟 25% 全剂量 对应的噪声水平。在本研究中,我们基于 Mayo 数据集 进一步生成具有噪声的稀疏视角正弦图和低噪声的稠密视角正弦图。具体而言,我们对原始 LDCT 图像进行 240 组平行束投影(parallel-beam projections),覆盖 360°,利用此稀疏采样方法模拟 具有噪声的稀疏视角正弦图(𝑆𝑖)。同时,我们对原始 NDCT 图像进行 720 组平行束投影,覆盖 360°,以模拟 *低噪声的稠密正弦图(𝑆𝐺𝑇)。在数据划分上,我们使用 9 名患者的数据 作为训练集,共 5410 对数据,使用 剩余 1 名患者的数据 作为测试集,共 526 对数据。

此外,我们还使用了 内部真实低剂量与高剂量头部仿真人体模体数据 进行实验验证。具体而言,我们使用 TrueBeam 系统(Varian Medical Systems, Palo Alto, CA) 的 锥形束 CT(CBCT)机载成像设备 对 人体头部仿真模体(anthropomorphic head phantom) 进行扫描。该数据集包含 低剂量参数(80 kV, 100 mA) 和 高剂量参数(80 kV, 400 mA) 获取的 CT 图像对,其中 低剂量设置的辐射暴露量仅为高剂量配置的四分之一。我们对该模体数据应用与 Mayo 数据集相同的预处理方法,生成相应的 具有噪声的稀疏视角正弦图。

该数据集能够直接反映 真实低剂量和常规剂量条件下的噪声与伪影特征,模拟日常 CBCT 影像引导放疗(IGRT, Image-Guided Radiotherapy) 过程中患者摆位时的影像特性。

Figure

图片

Fig. 1. DDoCT: A deep learning dual-domain joint optimization framework for noisy sparse-view CT imaging. The framework leverages the capabilities of projection domain superresolution and image domain wavelet high-frequency fusion provided by deep neural networks. FP and FBP refer to forward projection and filtered back projection, respectively.RFDN-sino and WHF-DN refer to two sub-networks applied in the projection domain stage and image domain stage, respectively

图 1.DDoCT:一种用于具有噪声的稀疏视角CT成像的深度学习双域联合优化框架。该框架利用深度神经网络在投影域的超分辨率能力以及图像域的小波高频融合能力。FP和FBP分别表示前向投影(Forward Projection)和滤波反投影(Filtered Back Projection)。RFDN-sino和WHF-DN分别指在投影域阶段和图像域阶段应用的两个子网络。

图片

Fig. 2. The detailed structure of the dual-domain joint optimization framework DDoCT projection domain stage network RFDN-sino and its key modules. (a) Projection domainstage network RFDN-sino in DDoCT. (b) The network structure of EFDB module in RFDN-sino. © DCR: Network structure of dense residual connection block. (d) ESA: Networkstructure of enhanced spatial attention module.

图 2. 双域联合优化框架 DDoCT 的详细结构,包括投影域阶段网络 RFDN-sino 及其关键模块。 (a) DDoCT 中的投影域阶段网络 RFDN-sino。 (b) RFDN-sino 中 EFDB(增强特征去噪块)模块的网络结构。 © DCR:密集残差连接块(Dense Residual Connection Block)的网络结构。 (d) ESA:增强空间注意力(Enhanced Spatial Attention)模块的网络结构。

图片

Fig. 3. The detailed structure of the dual-domain joint optimization framework DDoCT image domain stage network WHF-DN and its Spatial module and FEN module. (a) Imagedomain stage network WHF-DN in DDoCT. DWT and IDWT refer to the first-level wavelet transformation and inverse wavelet transformation, respectively. L, H, D, and V representthe low-frequency component, horizontal high-frequency component, diagonal high-frequency component, and vertical high-frequency component, respectively, after the first-levelwavelet transformation. (b) The network structure of the Spatial module in WHF-DN. © The network structure of the FEN module in WHF-DN.

图 3. 双域联合优化框架 DDoCT 的图像域阶段网络 WHF-DN 及其空间模块(Spatial module)和特征增强网络模块(FEN module)的详细结构。 (a) DDoCT 中的图像域阶段网络 WHF-DN。DWT 和 IDWT 分别表示第一层小波变换(Discrete Wavelet Transform)和逆小波变换(Inverse Discrete Wavelet Transform)。L、H、D 和 V 分别表示第一层小波变换后的低频分量(Low-frequency component)、水平高频分量(Horizontal high-frequency component)、对角高频分量(Diagonal high-frequency component)和垂直高频分量(Vertical high-frequency component)。 (b) WHF-DN 中空间模块(Spatial module)的网络结构。 © WHF-DN 中特征增强网络模块(FEN module)的网络结构。

图片

Fig. 4. The network structure of the wavelet domain feature learning network (WFLN) in WHF-DN.

图 4. WHF-DN 中小波域特征学习网络(WFLN)的网络结构。

图片

Fig. 5. Comparison of visual (together with quantitative results of PSNR and SSIM)and difference images of the Mayo testing dataset. The set range of the display windowis a window level of 40 HU and a window width of 400 HU (i.e., a range from −160HU to 240 HU). As for the difference images of the latter, the set range of the displaywindow is a window level of 0 HU and a window width of 200 HU (i.e., a range from−100 HU to 100 HU)

图 5. Mayo 测试数据集的视觉对比(包括 PSNR 和 SSIM 的定量结果)以及差异图像。显示窗口的设置范围为窗口水平(window level)40 HU,窗口宽度(window width)400 HU(即范围从 −160 HU 到 240 HU)。对于后续的差异图像,显示窗口的设置范围为窗口水平 0 HU,窗口宽度 200 HU(即范围从 −100 HU 到 100 HU)。

图片

Fig. 6. Comparison of visual (together with quantitative results of PSNR and SSIM)images of the anthropomorphic head phantom dataset, which comprises pairs of CTimages acquired using on-board CBCT scanner with low-dose (80 kV, 100 mA) andhigh-dose (80 kV, 400 mA) settings, respectively. The set range of the display windowis a window level of 500 HU and a window width of 3000 HU (i.e., a range from−1000 HU to 2000 HU).

图 6. 仿真人体头部模体数据集的视觉对比(包括 PSNR 和 SSIM 的定量结果)。该数据集包含使用机载 CBCT 扫描仪分别在低剂量(80 kV, 100 mA)和高剂量(80 kV, 400 mA)设置下获取的 CT 图像对。显示窗口的设置范围为窗口水平(window level)500 HU,窗口宽度(window width)3000 HU(即范围从 −1000 HU 到 2000 HU)。

图片

Fig. 7. The prediction results of different ablation models (together with quantitativeresults of PSNR and SSIM). The set range of the display window is a window levelof 40 HU and a window width of 400 HU (i.e., a range from −160 HU to 240 HU).A1: Learn mapping in the projection domain, directly reconstruct CT with FBP, noimage domain optimization. A2: Mapping learned only in the image domain. Inputis FBP-reconstructed image from noisy sparse-view sinograms 𝑆𝑖 . A3: Main networkbut without forward projection feedback loss and bicubic interpolation loss. A4: Mainnetwork with enhanced pixel-level L2 loss. A5: Main network but without perceptualloss. M: Our full model

图 7. 不同消融模型的预测结果(包括 PSNR 和 SSIM 的定量结果)。显示窗口的设置范围为窗口水平(window level)40 HU,窗口宽度(window width)400 HU(即范围从 −160 HU 到 240 HU)。

A1:在投影域中学习映射,直接使用 FBP 重建 CT 图像,未进行图像域优化。

A2:仅在图像域中学习映射。输入为从噪声稀疏视角正弦图(𝑆𝑖)重建的 FBP 图像。

A3:主网络,但不包含前向投影反馈损失(forward projection feedback loss)和双三次插值损失(bicubic interpolation loss)。

A4:主网络,使用增强的像素级 L2 损失。

A5:主网络,但未使用感知损失(perceptual loss)。

M:我们的完整模型。

Table

图片

Table 1The average PSNR, SSIM, RMSE and VIF results of all 526 simulated noisy sparse-view

sinograms for test patient ‘‘L506’’ in the Mayo dataset

表 1. Mayo 数据集中测试患者 “L506” 所有 526 个模拟噪声稀疏视角正弦图的平均 PSNR、SSIM、RMSE 和 VIF 结果。

图片

Table 2The average PSNR, SSIM, RMSE, and VIF results of the real dose head phantom data

表 2. 真实剂量头部仿真人体数据的平均 PSNR、SSIM、RMSE 和 VIF 结果。

图片

Table 3Quantitative comparison of different ablation experiments on test patient ‘‘L506’’ in theMayo dataset

表 3. Mayo 数据集中测试患者 “L506” 的不同消融实验的定量比较。

    说点什么吧...