Title
题目
Fourier Convolution Block with global receptive field for MRI reconstruction
用于MRI重建的全局感受野傅里叶卷积块
01
文献速递介绍
磁共振成像(MRI)是临床诊断中广泛应用的成像技术。然而,由于扫描时间长和可能的运动伪影,MRI的临床应用受到限制。通过在傅里叶采集空间(即k空间)中获取欠采样数据是缩短MRI扫描时间的常用方法(Ravishankar和Bresler,2010)。然而,获取较少的数据点会违反奈奎斯特采样准则,导致重建图像中不可避免的混叠伪影(Shannon,1948)。
去除欠采样MRI扫描中的混叠伪影是一个具有挑战性的反问题,需结合额外的先验信息。压缩感知(CS)技术(Lustig等,2007)利用预定义变换中的稀疏性先验和迭代优化算法来解决混叠问题。然而,CS方法需要手动选择稀疏类型和其他参数,并且在较高的欠采样率下表现较差(Zbontar等,2018)。
近年来,基于深度学习(DL)的模型在MRI重建中表现出显著的性能。这些模型利用深度神经网络的能力恢复图像,并利用从数据学习过程中获得的先验知识,在重建质量和速度方面超越了传统的CS算法。MRI重建中使用DL模型的主要方法有两种。第一种方法利用端到端模型(Ronneberger等,2015;Lee等,2018;Han等,2019;El-Rewaidy等,2020),这些模型以配对方式进行训练,使用欠采样图像和相应的完全采样图像,而不依赖于MRI的其他物理先验。第二种方法是展开现有的迭代重建算法,并用神经网络替代部分迭代(Sun等,2016;Zhang和Ghanem,2018;Xiang等,2021;Hammernik等,2018;Schlemper等,2017;Sandino等,2020;Aggarwal等,2018)。通过这种方式,迭代算法的复杂参数选择可以通过学习来优化,而神经网络作为更强大的正则化项。
在CS和基于DL的MRI重建领域,欠采样模式通常是随机的或均匀的(Sriram等,2020b),而不像超分辨率任务中的截断模式(Chen等,2018)。由此产生的混叠伪影在图像域中呈现长程分布,需要具有大感受野(RF)的重建模型来有效去除这些伪影。然而,大多数基于DL的MRI重建模型是基于卷积神经网络(CNNs)的,由于使用局部空间卷积(如3×3或5×5卷积核),它们的感受野较为有限。尽管足够堆叠卷积层的深度网络理论上可以覆盖较大的感受野(Jaderberg等,2015),但CNN的有效感受野(ERF)远远低于预期(Ding等,2022)。
最近,视觉变换器(ViTs)作为一种强大的替代方法(Dosovitskiy等,2020;Touvron等,2021;Liu等,2021)崭露头角,通过堆叠3×3卷积并利用多头自注意力机制捕捉全局信息,超过了传统的CNN。其他方法则探索了使用更大卷积核的CNN(Liu等,2022b;Ding等,2022;Liu等,2022a;Deng等,2023),并取得了与ViTs相当甚至更优的性能。感受野的增加被认为是这些方法成功的关键因素。然而,使用大卷积核CNN和ViTs面临着计算需求高和训练困难等挑战(Liu等,2022a)。
在本研究中,我们提出了一种傅里叶卷积块(FCB),这是一种可插拔的全局算子,可以用于CNN进行MRI重建。这里使用的“可插拔”一词表示FCB是一个独立的卷积块,与基于预训练模型的PnP迭代算法(Ahmad等,2020)不同。FCB通过将标准的空间域卷积转换到频率域,从而提供广泛的感受野,利用空间离散域卷积等价于频率域中的逐元素乘法。FCB实现了一个全局感受野,匹配图像大小,并且可以自适应学习,同时比大卷积核卷积需要更少的计算要求。为了减少FCB的参数,常规卷积被修改为深度可分离卷积(Howard等,2017),这一设计灵感来自Sun等(2023)。这种修改使傅里叶卷积能够应用于现代CNN深度架构中。为了进一步克服训练大感受野网络的挑战(Liu等,2022a),我们还提出了一种新的重新参数化策略,以更好地捕捉局部和全局信息。
因此,本工作的主要贡献如下:
本研究提出了一种可插拔的傅里叶操作符FCB,用于替代局部空间卷积,从而扩展CNN的感受野。结果表明,所提出的FCB能够扩大CNN的有效感受野,并提供比ViTs和大卷积核CNN更好的重建质量,同时计算时间更短。
我们将FCB集成到深度可分离结构中,并提出了一种新的重新参数化策略来训练网络,使其能够更好地融合局部和全局信息。这两项改进为傅里叶卷积在深度架构中的高效训练铺平了道路。
Aastract
摘要
Reconstructing images from under-sampled Magnetic Resonance Imaging (MRI) signals significantly reducesscan time and improves clinical practice. However, Convolutional Neural Network (CNN)-based methods,while demonstrating great performance in MRI reconstruction, may face limitations due to their restrictedreceptive field (RF), hindering the capture of global features. This is particularly crucial for reconstruction,as aliasing artifacts are distributed globally. Recent advancements in Vision Transformers have furtheremphasized the significance of a large RF. In this study, we proposed a novel global Fourier Convolution Block(FCB) with whole image RF and low computational complexity by transforming the regular spatial domainconvolutions into frequency domain. Visualizations of the effective RF and trained kernels demonstrated thatFCB improves the RF of reconstruction models in practice. The proposed FCB was evaluated on four popularCNN architectures using brain and knee MRI datasets. Models with FCB achieved superior PSNR and SSIMthan baseline models and exhibited more details and texture recovery.
从欠采样的磁共振成像(MRI)信号重建图像可以显著减少扫描时间并改善临床实践。然而,尽管卷积神经网络(CNN)方法在MRI重建中表现出色,但由于其受限的感受野(RF),可能无法捕捉到全局特征。这一点对于重建尤其重要,因为别名伪影是全局分布的。最近,视觉变换器(Vision Transformers)的进展进一步强调了大感受野的重要性。在本研究中,我们提出了一种新颖的全局傅里叶卷积块(FCB),该块具有全图感受野且计算复杂度低,通过将常规空间域卷积转换为频域卷积来实现。通过对有效感受野和训练卷积核的可视化,我们证明了FCB在实践中改善了重建模型的感受野。我们在四种流行的CNN架构上使用脑部和膝部MRI数据集对提出的FCB进行了评估。使用FCB的模型在峰值信噪比(PSNR)和结构相似性指数(SSIM)方面超过了基线模型,并展示了更多细节和纹理恢复。
Method
方法
3.1. Fourier convolution
We will demonstrate the capability of Fourier Convolution toachieve both local and global RF in discrete scenarios. Fourier Convolution involves element-wise multiplication in the frequency domain. TheDiscrete Fourier Transform (DFT) transforms 2D images in the neuralnetwork to the frequency domain and is formulated as:𝑋 ( 𝑘1 , 𝑘2 ) =𝑁−1 ∑𝑛=0𝑁−1 ∑𝑚=0𝑥(𝑛, 𝑚)𝑁*2exp [ −2 𝑁 𝜋𝑗 ( 𝑛𝑘1 + 𝑚𝑘2 ) ]
(1)where 𝑥 is the feature map with size of 𝑁 × 𝑁, and 𝑋 is its spectrum.The Fourier Convolution is then formulated as the element-wise multiplication between the spectrum and a production kernel W with thesame size:𝑌 ( 𝑘1 , 𝑘2 ) = 𝑋 ( 𝑘1 , 𝑘2 ) ⊙ 𝑊 ( 𝑘1 , 𝑘2 )
(2)With the definition of inverse DFT (IDFT):𝑥(𝑛, 𝑚) =𝑁−1 ∑𝑘1=0𝑁−1 ∑𝑘2=0𝑋 ( 𝑘1 , 𝑘2 ) exp [ 2 𝑁 𝜋𝑗 ( 𝑛𝑘1 + 𝑚𝑘2 ) ]
(3)it could be deduced that there is an equivalence between FourierConvolution and spatial convolution:IDFT{𝑌 } = 𝑥 ∗ 𝑤
(4)where ‘‘∗’’ represents convolution with circle-padding:
𝑥 ∗ 𝑤 =𝑁−1 ∑𝑝=0𝑁−1 ∑𝑞=0𝑥(𝑝, 𝑞)𝑤 (⌊(𝑛 − 𝑝) ∕𝑁⌋, ⌊(𝑚 − 𝑞) ∕𝑁⌋)
(5)The convolution kernel 𝑤 is the IDFT of FCB kernel 𝑊 with size 𝑁 ×𝑁but it can be expressed as a zero-padding version of some smaller kernel𝑤′ with a size of 𝐾(1 ≤ 𝐾 ≤ 𝑁):𝑤(𝑛, 𝑚) = { 𝑤′(𝑛, 𝑚), 𝑛 ≤ 𝐾, 𝑚 ≤ 𝐾0, others
(6)It implies that, although production kernel 𝑊 in Fourier domain hasa constant size as the same as the input, its spatial counterpart couldbe a zero-padded convolution kernel of which the size 𝐾 × 𝐾 variesfrom 1 × 1 to the input size 𝑁 × 𝑁. This point highlights that Fourierconvolution can result in a global RF, as it could be equivalent to aconvolution with a kernel size that matches the input image. The localRF is also accessible when the Fourier convolution corresponds thespatial one with small size.
3.1. 傅里叶卷积
我们将展示傅里叶卷积在离散场景中实现局部和全局RF(接收场)能力。傅里叶卷积涉及在频域中进行逐元素乘法。离散傅里叶变换(DFT)将神经网络中的2D图像转换到频域,其公式如下:𝑋 ( 𝑘1 , 𝑘2 ) = 𝑁−1 ∑𝑛=0𝑁−1 ∑𝑚=0 𝑥(𝑛, 𝑚) * 𝑁² exp[ −2 𝑁 𝜋𝑗 ( 𝑛𝑘1 + 𝑚𝑘2 ) ]
(1) 其中 𝑥 是大小为 𝑁 × 𝑁 的特征图,𝑋 是其频谱。
傅里叶卷积的公式是频谱与大小相同的生产卷积核 𝑊 进行逐元素乘法:𝑌 ( 𝑘1 , 𝑘2 ) = 𝑋 ( 𝑘1 , 𝑘2 ) ⊙ 𝑊 ( 𝑘1 , 𝑘2 )
(2) 根据逆离散傅里叶变换(IDFT)的定义:𝑥(𝑛, 𝑚) = 𝑁−1 ∑𝑘1=0𝑁−1 ∑𝑘2=0 𝑋 ( 𝑘1 , 𝑘2 ) exp [ 2 𝑁 𝜋𝑗* ( 𝑛𝑘1 + 𝑚𝑘2 ) ]
(3) 可以推导出傅里叶卷积与空间卷积之间的等价关系:IDFT{𝑌} = 𝑥 𝑤
(4) 其中 ‘‘∗’’ 表示带圆形填充的卷积:
𝑥 ∗ 𝑤 = 𝑁−1 ∑𝑝=0𝑁−1 ∑𝑞=0 𝑥(𝑝, 𝑞) 𝑤 (⌊(𝑛 − 𝑝) / 𝑁⌋, ⌊(𝑚 − 𝑞) / 𝑁⌋)
(5) 卷积核 𝑤 是 FCB 核心 𝑊 的逆离散傅里叶变换(IDFT),其大小为 𝑁 × 𝑁,但它可以表示为某个较小核的零填充版本 𝑤′,大小为 𝐾(1 ≤ 𝐾 ≤ 𝑁):
𝑤(𝑛, 𝑚) = { 𝑤′(𝑛, 𝑚), 𝑛 ≤ 𝐾,𝑚 ≤𝐾0, 其他情况下
(6) 这意味着,尽管傅里叶域中的生产卷积核 𝑊 与输入具有相同的固定大小,但其空间对应物可以是一个零填充的卷积核,其大小 𝐾 × 𝐾 从 1 × 1 到输入大小 𝑁 × 𝑁 不等。这个点强调了傅里叶卷积能够产生全局接收场(RF),因为它等价于一个与输入图像相匹配大小的卷积核的卷积。局部接收场也是可达的,当傅里叶卷积对应于一个小尺寸的空间卷积时。
Conclusion
结论
This paper introduced a novel convolution block design with aglobal receptive field for MRI reconstruction CNNs. The experimental results showed that the proposed FCB effectively improved thereconstruction performance of the baseline CNN models. At differentundersampling rates, models enhanced with FCB achieved better quantitative metrics in various datasets, even with additional noise added.Furthermore, these models exhibited superior capability in recoveringintricate details, including texture and edges. Notably, the models withFCB outperformed Vision Transformers, which are considered mainstream models with large receptive fields. Additionally, FCB modelsalso surpassed methods that embed k-space data to enhance long-rangeconnections. FCB also demonstrated low computational complexity,with experiments showing that its runtime is significantly less than Vision Transformers and comparable to traditional CNNs with an 11 × 11convolution kernel.Through visualization, it is demonstrated that the proposed FCBeffectively scales up the RF of CNNs. Unlike other approaches thatfocus on the architecture design of CNNs, our approach pays attentionto the basic convolution layer in CNNs. Some other works use cascaded or pyramidal architectures (Schlemper et al., 2017; Chen et al.,2022; Sriram et al., 2020a) to capture long-distance correlation. Ourproposed FCB serves as a Plug-and-Play block that has the potential tobe incorporated into these various CNN architectures. Moreover, FCBprocesses data in the hidden layer as real values and utilizes conjugatesymmetry in the frequency domain to save memory and time. Formethods building complex CNNs (Wang et al., 2020; Cole et al., 2021),FCB could also be smoothly integrated in complex mode.The proposed FCB approach still has some limitations. Althoughits usage memory is comparable to spatial convolution, the numberof parameters in FCB is still significantly higher, leading to increasedstorage memory costs. This presents a tradeoff between the memoryand speed when a large RF is desired. Additionally, the proposed FCBinvolves repeated FFTs and IFFTs, which may impact computing efficiency. Some works (Ayat et al., 2019; Watanabe and Wolf, 2021) haveattempted to design pure spectral-based CNNs to address this issue,but the question of activation functions in the Fourier domain remainsopen. Future research in this area could explore ways to improve thecomputing efficiency of FCB.
本文介绍了一种具有全局感受野的卷积块设计,应用于MRI重建卷积神经网络(CNN)。实验结果表明,所提出的FCB(全局卷积块)有效提升了基准CNN模型的重建性能。在不同的欠采样率下,增强了FCB的模型在多个数据集中的定量指标表现更好,即使在添加了额外噪声的情况下也是如此。此外,这些模型在恢复复杂细节(包括纹理和边缘)方面表现出更强的能力。值得注意的是,带有FCB的模型超越了被认为是主流模型、具有大感受野的视觉变换器(Vision Transformers)。此外,FCB模型也优于通过嵌入k空间数据来增强长距离连接的方法。FCB还表现出较低的计算复杂度,实验表明其运行时间显著低于视觉变换器,并且与传统的11×11卷积核CNN模型的运行时间相当。
通过可视化,证明了所提出的FCB有效地扩大了CNN的感受野。与其他专注于CNN架构设计的方法不同,我们的方法关注的是CNN中的基本卷积层。一些其他工作(如Schlemper等,2017年;Chen等,2022年;Sriram等,2020年)使用级联或金字塔架构来捕捉远距离的相关性。我们提出的FCB作为一种即插即用的模块,具有可以融入这些不同CNN架构的潜力。此外,FCB在隐藏层处理数据时使用实值,并利用频域中的共轭对称性来节省内存和时间。对于构建复杂CNN的工作(如Wang等,2020年;Cole等,2021年),FCB也可以顺利地集成到复杂模式中。
尽管如此,所提出的FCB方法仍然存在一些局限性。尽管其使用的内存与空间卷积相当,但FCB中的参数数量仍然显著较高,从而增加了存储内存的开销。这在需要大感受野时呈现了内存与速度之间的权衡。此外,所提出的FCB涉及重复的FFT和IFFT操作,这可能影响计算效率。一些工作(如Ayat等,2019年;Watanabe和Wolf,2021年)已尝试设计纯谱域的CNN来解决这个问题,但在傅里叶域中的激活函数问题仍未解决。未来在这一领域的研究可以探索提高FCB计算效率的方法。
Figure
图
Fig. 1. The illustration of the proposed Fourier Convolution Block (FCB). (a) Threetraditional convolution layers with kernel sizes of 3 × 3, 11 × 11, and 21 × 21. (b)The equivalent FCBs with the size of 𝑁×𝑁 corresponding to the traditional convolutionlayers with different kernel size shown on the top
图 1. 提出的傅里叶卷积块(FCB)的示意图。(a) 三个传统卷积层,卷积核大小分别为3×3、11×11和21×21。(b) 等效的傅里叶卷积块(FCB),大小为 𝑁×𝑁,对应于上面展示的不同卷积核大小的传统卷积层。
Fig. 2. The architecture of baseline models and proposed convolution blocks (only one iteration in MoDL and VSNet is shown). Abbrev: DW = Depth-wise Convolution, PW =Point-Wise Convolution. The convolution blocks are consistently colored with the model views, with DW layers highlighted in red. This indicates that the convolution operationhere can be replaced by FCB.
图2. 基准模型和提出的卷积块的架构(仅显示 MoDL 和 VSNet 中的一次迭代)。缩写:DW = 深度卷积,PW = 点卷积。卷积块的颜色与模型视图一致,DW 层以红色突出显示。这表示这里的卷积操作可以被傅里叶卷积块(FCB)替代。
Fig. 3. A reconstruction example of the T2-weighted data in validation set in brain dataset. The top displays the result at 8× acceleration, while the bottom displays the resultat 12× acceleration. PSNR and SSIM of the single image reconstructed are noted in the top right-hand corner. The second and fifth rows depict the reconstruction of the zoomedregion marked in the ground truth. The third and sixth rows display the residual error in this zoomed region, with all errors multiplied by 10 for better visualization.
图3. 脑部数据集验证集中的T2加权数据重建示例。顶部显示8倍加速的重建结果,底部显示12倍加速的重建结果。单张重建图像的PSNR和SSIM值标注在右上角。第二行和第五行展示了放大区域的重建,放大区域在真实图像中有所标注。第三行和第六行显示了该放大区域的残差误差,所有误差都乘以10以便于可视化。
Fig. 4. A reconstruction example of the T1-weighted data in validation set in brain dataset.
图4. 大脑数据集中验证集的T1加权数据重建示例。
Fig. 5. A reconstruction example of the validation set in knee dataset. The top displays the result at 8× acceleration, while the bottom displays the result at 12× acceleration.
图5. 膝盖数据集验证集的重建示例。顶部显示8倍加速的结果,底部显示12倍加速的结果。
Fig. 6. An example of brain T1-weighted data reconstructed by F-MoDL and other methods.
图6. 使用F-MoDL和其他方法重建的脑部T1加权数据示例。
Fig. 7. PSNR results on the knee validation dataset at 8× acceleration when FCB isapplied to different layers in the model. (a) Results for UNet. (b) Results for MoDL.© Results for VSNet.
图7. 在膝部验证数据集上,FCB应用于模型不同层时的PSNR结果,8倍加速。(a) UNet的结果。(b) MoDL的结果。© VSNet的结果。
Fig. 8. Spectral visualization of convolution kernels in UNet and F-UNet (FCB wasdeployed in the last 6 layers). (a) Spectral maps of UNet. (b) Spectral maps of F-UNet.The spectral amplitude of convolution kernels or the kernels in FCB are shown fromthe left to right along the depth of UNet. The upper and lower rows correspond to thedouble convolution layers in each UNet block. The ranks of spectrum of kernels arenoted in the bottom right-hand corner of each map
图.8. 对UNet和F-UNet(在最后6层中部署了FCB)的卷积核进行频谱可视化(a)UNet的频谱图。(b)F-UNet的频谱图。从左到右,沿UNet的深度展示了卷积核或FCB中的核的频谱幅度。每组UNet块中的双卷积层分别位于上下两行。在每张图的右下角标明了核的频谱秩。
Fig. 9. (a) ERF of UNet. (b) ERF of F-UNet. © PSF of the 2D Poisson sampling pattern.The ERF of F-UNet covered a larger region, closing to the sampling PSF
图.9.(a) UNet的ERF。(b) F-UNet的ERF。© 2D泊松采样图案的PSF。F-UNet的ERF覆盖了更大的区域,接近采样PSF。
Table
表
Table 1Comparison of the computation operations and parameters between regular convolutionand FCB.
表 1 常规卷积与傅里叶卷积块(FCB)在计算操作和参数上的比较
Table 2Quantitative results on the validation set in brain and knee datasets at 8× and 12× acceleration
表2 在脑部和膝部数据集上,8倍和12倍加速的验证集定量结果
Table 3Quantitative results on the validation set using Cartesian and Radial mask.
表3. 使用笛卡尔和径向掩码的验证集定量结果。
Table 4Quantitative results on the validation set when 10% Or 20% Gaussian noise added inbrain dataset at 8× acceleration.
表4. 在大脑数据集上添加10%或20%高斯噪声后的验证集定量结果,8倍加速。
Table 5Quantitative results compared with other methods.
表5 与其他方法的定量结果比较。
Table 6Comparison the reconstruction performance and runtime between UNet with different Kernel sizes, FasterFC-UNet and F-UNet on the kneedataset at 8× acceleration.
表6 比较在8×加速下,不同核大小的UNet、FasterFC-UNet和F-UNet在kneedataset上的重建性能和运行时间。
Table 7Ablation study assessing the impact of modifications and the Re-parametrization method on the knee validation dataset at 8× acceleration.
表 7评估在8倍加速下修改和重新参数化方法对膝部验证数据集的影响的消融研究。