Title
题目
Enhancing global sensitivity and uncertainty quantification in medical imagereconstruction with Monte Carlo arbitrary-masked mamba
通过蒙特卡洛任意遮罩的MAMBA增强医学图像重建中的全局灵敏度和不确定性量化
01
文献速递介绍
医学图像重建是医学影像学中最基本且至关重要的组成部分。高质量和高保真度的重建医学图像确保了后续疾病诊断和治疗规划的精准性和有效性,从而减少了潜在的患者健康风险(Wang等,2020)。磁共振成像(MRI)提供了高分辨率且可重复的评估,无需暴露于辐射中。快速MRI广泛应用于从子奈奎斯特采样的 k 空间测量中生成MR图像,旨在加速本质上较慢的数据采集过程并消除伪影(Liang等,2020;Hammernik等,2023;Huang等,2024b)。X射线计算机断层扫描(CT)虽然能够生成高质量且详细的图像,但涉及到辐射风险。稀视角CT(SVCT)已被开发出来,通过使用较少的投影视角来降低辐射剂量,尽管这样做会引入显著的伪影(Shah和Platt,2008;Pan等,2009)。正电子发射断层扫描(PET)对于理解代谢和功能性身体过程至关重要,通常需要长时间扫描或高剂量才能获得高质量的图像,导致不适感和风险。为了解决这个问题,低剂量PET(LDPET)的开发为提高图像质量而不增加注射剂量提供了一个有前景的方向(Knopp,2020)。
医学图像重建的一个关键研究主题和挑战是开发有效、高效且可靠的重建模型。人工智能的快速发展推动了基于深度学习的医学图像重建技术的发展和广泛应用。卷积神经网络(CNN)和视觉变换器(ViT)(Dosovitskiy等,2020)是已取得显著成功的主要范式,广泛应用于医学影像学领域。然而,CNN和ViT各自具有独特的优势和固有的局限性。
卷积神经网络在捕捉视觉特征方面表现出色,尤其是在识别局部模式时,利用其层级架构和归纳偏置。共享权重机制使其比多层感知器(MLP)更加参数高效。然而,尽管CNN具有强大的特征提取能力和线性复杂度,如图1(A)所示,CNN通常表现出局部敏感性,缺乏长程依赖性,限制了其在全球特征上下文化方面的能力。视觉变换器(ViT)(Dosovitskiy等,2020)由于其大范围感受野和全局敏感性,通常在捕捉广泛的上下文信息方面优于CNN。然而,如图1(A)所示,由于自注意力机制的二次复杂度(Liu等,2024),其显著的计算需求限制了其在医学图像重建中的实用性。最近的基于变换器的医学图像重建模型尝试通过以下方式缓解这些限制:(1)采用折中策略,在移动窗口内应用自注意力机制,而不是在整个特征图上应用(Liang等,2021;Huang等,2022a);(2)构建混合模型,结合CNN(Chen等,2021)或Swin变换器(Liu等,2021),仅在深层、低分辨率潜在空间中应用ViT模块(Chen等,2021;Huang等,2022c)。
作为一种强有力的替代方案,来自自然语言处理(NLP)的新兴模型Mamba(Gu和Dao,2023)结合了CNN和ViT的优势。Mamba由于其线性复杂度和通过硬件感知优化的增强,展现了在长序列建模方面的优越效率。这一效率使得Mamba成为变换器中自注意力机制的有力竞争者,尤其适用于处理高分辨率视觉数据的任务,如图1(A)所示。
在本研究中,我们旨在探索Mamba在医学图像重建领域的潜力,并提出基于Mamba的模型MambaMIR,用于联合医学图像重建和不确定性估计。如图1(A)所示,MambaMIR具有全局敏感性和线性计算复杂度,特别适用于医学图像重建等低级任务,这些任务通常需要处理长序列(大空间分辨率)并保持全局敏感性。
在医学图像重建中,不确定性估计被认为是一项重要的置信度评估,它通过突出显示关键关注区域为临床医生提供额外信息。蒙特卡罗(MC)dropout是常用的不确定性估计方法,依赖于在训练和推理阶段的dropout随机性(Gal和Ghahramani,2016)。然而,如图1(B)所示,dropout需要仔细调整dropout率的超参数,这通常对重建性能敏感。此外,尽管dropout能够减轻高层任务中过拟合的风险,但在低级任务(如图像重建)中,dropout通常会导致性能下降(Kong等,2022)。
Abatract
摘要
Deep learning has been extensively applied in medical image reconstruction, where Convolutional NeuralNetworks (CNNs) and Vision Transformers (ViTs) represent the predominant paradigms, each possessingdistinct advantages and inherent limitations: CNNs exhibit linear complexity with local sensitivity, whereasViTs demonstrate quadratic complexity with global sensitivity. The emerging Mamba has shown superiority inlearning visual representation, which combines the advantages of linear scalability and global sensitivity. Inthis study, we introduce MambaMIR, an Arbitrary-Masked Mamba-based model with wavelet decompositionfor joint medical image reconstruction and uncertainty estimation. A novel Arbitrary Scan Masking (ASM)mechanism ‘‘masks out’’ redundant information to introduce randomness for further uncertainty estimation.Compared to the commonly used Monte Carlo (MC) dropout, our proposed MC-ASM provides an uncertaintymap without the need for hyperparameter tuning and mitigates the performance drop typically observedwhen applying dropout to low-level tasks. For further texture preservation and better perceptual quality, weemploy the wavelet transformation into MambaMIR and explore its variant based on the Generative AdversarialNetwork, namely MambaMIR-GAN. Comprehensive experiments have been conducted for multiple representative medical image reconstruction tasks, demonstrating that the proposed MambaMIR and MambaMIR-GANoutperform other baseline and state-of-the-art methods in different reconstruction tasks, where MambaMIRachieves the best reconstruction fidelity and MambaMIR-GAN has the best perceptual quality. In addition,our MC-ASM provides uncertainty maps as an additional tool for clinicians, while mitigating the typicalperformance drop caused by the commonly used dropout.
深度学习在医学图像重建中的应用已得到广泛关注,其中卷积神经网络(CNN)和视觉变换器(ViT)是主要的技术范式,各自具有独特的优势和固有的局限性:CNN具有线性复杂度和局部敏感性,而ViT则表现出二次复杂度和全局敏感性。新兴的Mamba在学习视觉表示方面表现出了优越性,它结合了线性可扩展性和全局敏感性的优势。本研究中,我们引入了MambaMIR,一种基于任意遮罩Mamba的模型,并结合小波分解用于联合医学图像重建和不确定性估计。提出了一种新颖的任意扫描遮罩(ASM)机制,它通过“遮罩”冗余信息来引入随机性,从而进一步进行不确定性估计。与常用的蒙特卡洛(MC)dropout方法相比,我们提出的MC-ASM能够提供不需要超参数调节的不确定性图,同时减轻了在低级任务中应用dropout时通常出现的性能下降。为了进一步保留纹理并改善感知质量,我们将小波变换引入MambaMIR,并探索了基于生成对抗网络的变种,即MambaMIR-GAN。我们对多个代表性的医学图像重建任务进行了全面实验,结果表明,所提出的MambaMIR和MambaMIR-GAN在不同的重建任务中优于其他基准和最先进方法,其中MambaMIR实现了最佳的重建保真度,而MambaMIR-GAN则具有最佳的感知质量。此外,我们的MC-ASM为临床医生提供了不确定性图作为额外工具,同时减轻了通常由于使用dropout引起的性能下降。
Method
方法
3.1. Medical image reconstruction
The forward acquisition process for medical images is described by:𝐲 = 𝐀𝐱 + 𝐧,(4)where 𝐱 ∈ C𝑛 represents the image of interest, 𝐲 ∈ C𝑚 denotesthe corresponding measurements, and 𝐧 ∈ C𝑚 is the inevitable noiseencountered during the measurement process.
3.1. 医学图像重建
医学图像的前向采集过程可以通过以下公式描述:y=Ax+n\mathbf{y} = \mathbf{A}\mathbf{x} + \mathbf{n}y=Ax+n(4)其中,𝐱 ∈ Cⁿ 表示感兴趣的图像,𝐲 ∈ Cᵐ 表示对应的测量值,𝐧 ∈ Cᵐ 是在测量过程中不可避免的噪声。
Conclusion
结论
In conclusion, our proposed MambaMIR and MambaMIR-GAN represent significant advances in the field of medical image reconstruction.The proposed generalised framework has achieved superior performance on fast MRI, SVCT, and LDPET, which proves its scalability andpotential for other reconstruction applications such as ultrasound orlow-dose CT reconstruction. Our proposed MC-ASM mechanism demonstrates its superiority over the commonly used MC dropout, providingreliable uncertainty estimation without the need for hyperparametertuning, while mitigating the performance drop often seen when usingdropout for low-level tasks.Future studies may investigate the scalability of these models forvarious imaging modalities and their potential for computational efficiency.
总之,我们提出的MambaMIR和MambaMIR-GAN在医学图像重建领域代表了重要的进展。所提出的通用框架在快速MRI、稀视图CT(SVCT)和低剂量PET(LDPET)上的优异表现证明了其可扩展性,并且具有潜力应用于其他重建任务,如超声或低剂量CT重建。我们提出的MC-ASM机制优于常用的MC dropout,能够提供可靠的不确定性估计,而无需调节超参数,同时缓解了在低级任务中使用dropout时常见的性能下降问题。
未来的研究可以探索这些模型在各种影像模式下的可扩展性以及它们在计算效率方面的潜力。
Figure
图
Fig. 1. (A) Comparison between Convolutional Neural Networks (CNNs), Vision Transformers (ViTs), and VMamba. CNNs and ViTs represent two predominant paradigms, eachpossessing distinct advantages and inherent limitations: CNNs exhibit linear complexity with local sensitivity, whereas ViTs demonstrate quadratic complexity with global sensitivity.The emerging VMamba (Liu et al., 2024) has shown superiority in computer vision tasks, combining the advantages of linear scalability and global sensitivity. (B) Comparisonbetween dropout and the proposed Arbitrary Scan Masking (ASM) mechanism. Dropout requires careful hyperparameter tuning (dropout rate) and typically leads to a performancedrop in low-level tasks, despite its ability to mitigate overfitting in high-level tasks. The proposed ASM mechanism presents a superior alternative to dropout. Instead of randomly‘‘dropping’’ some activations that may be essential for the final outcome, our ASM strategically ‘‘masks out’’ a part of redundant information during training and inference stages
图1. (A) 卷积神经网络(CNN)、视觉变换器(ViT)和VMamba的比较。CNN和ViT分别代表两种主要的范式,各自具有独特的优势和固有的局限性:CNN表现出线性复杂度和局部敏感性,而ViT则表现出二次复杂度和全局敏感性。新兴的VMamba(Liu等,2024)在计算机视觉任务中展现了优越性,结合了线性可扩展性和全局敏感性的优势。(B) dropout和提出的任意扫描遮罩(ASM)机制的比较。Dropout需要仔细调节超参数(dropout率),并且通常会导致低级任务中的性能下降,尽管它能够减轻高层任务中的过拟合。提出的ASM机制提供了dropout的优越替代方案。与随机“丢弃”可能对最终结果至关重要的激活不同,我们的ASM在训练和推理阶段有策略地“遮罩”掉一部分冗余信息。
Fig. 2. (A) The proposed Arbitrary-Masked S6 (AMS6) block. An AMS6 block includes a Scan Expanding module, an Arbitrary Scan Masking module, an S6 module, and a ScanMerging module. (B) Uncertainty estimation with the proposed Arbitrary Scan Masking mechanism during inference. © The framework of the proposed MambaMIR
图2. (A) 提出的任意遮罩S6(AMS6)模块。一个AMS6模块包括扫描扩展模块、任意扫描遮罩模块、S6模块和扫描合并模块。(B) 在推理阶段使用提出的任意扫描遮罩机制进行不确定性估计。© 提出的MambaMIR框架。
Fig. . 3. Visualised results on FastMRI at AF ×4. Ground truth (GT), undersampled zero-filled (ZF) images, reconstruction results and corresponding error maps are presented
图3. 在FastMRI数据集上使用加速因子(AF)×4的可视化结果。展示了真实值(GT)、欠采样零填充(ZF)图像、重建结果以及对应的误差图。
Fig. 4. Visualised results on SKMTEA at AF ×8. Ground truth (GT), undersampled zero-filled (ZF) images, reconstruction results and corresponding error maps are presented.
图4. 在SKMTEA数据集上使用加速因子(AF)×8的可视化结果。展示了真实值(GT)、欠采样零填充(ZF)图像、重建结果以及对应的误差图。
Fig. 5. Visualised results for SVCT on the chest subset. Ground truth (GT), sparse-view images reconstructed by Filtered Backprojection (FBP), reconstruction results andcorresponding error maps are presented. CT images are normalised within the range of [−1024, 3096] HU for error map computation and display
图5. 在胸部子集上进行SVCT的可视化结果。展示了真实值(GT)、通过滤波反投影(FBP)重建的稀视图像、重建结果以及对应的误差图。CT图像在误差图计算和显示时进行了[-1024, 3096] HU范围的归一化处理
Fig. 6. Visualised results for LDPET at DRF ×6. Ground truth (GT), low-dose images, reconstruction results and corresponding error maps are presented.
图6. 在LDPET数据集上使用剂量减少因子(DRF)×6的可视化结果。展示了真实值(GT)、低剂量图像、重建结果以及对应的误差图。
Fig. 7. (A) Ablation studies on hyperparameters regarding the patch size, the randomly cropping resolution during training, and the number of S6’s latent space channels (#Channel);(B) Experiments comparing Mamba-based and Transformer-based models. The size of the data circle and the number below indicate the computational complexity (GFLOPs).
图7. (A) 关于超参数的消融研究,涉及训练过程中随机裁剪分辨率的补丁大小和S6的潜在空间通道数(#Channel);(B) 比较基于Mamba和基于Transformer的模型的实验。数据圆圈的大小和下面的数字表示计算复杂度(GFLOPs)。
Fig. 8. Comparison of Effective Receptive Fields before and after training between theproposed MambaMIR and other methods for SVCT on abdomen subset
图8. 提出的MambaMIR与其他方法在训练前后对于SVCT在腹部子集上的有效感受野比较。
Fig. 9. Quantitative comparison on FastMRI dataset between (1) MambaMIR withoutMC-ASM or MC dropout (control group), (2) MambaMIR with MC-ASM and (3)MambaMIR with MC Dropout using different dropout rates.
图9. 在FastMRI数据集上进行的定量比较:(1) 不使用MC-ASM或MC dropout的MambaMIR(对照组),(2) 使用MC-ASM的MambaMIR,以及(3) 使用不同dropout率的MC Dropout的MambaMIR。
Fig. 10. (A) Visualised samples of uncertainty maps provided by MC dropout (𝛼 = 0.2) and our MC-ASM, along with the corresponding error maps. (B) Quantitative comparisonbetween (1) MambaMIR without MC-ASM or MC dropout (control group), (2) MambaMIR with MC-ASM and (3) MambaMIR with MC Dropout (𝛼 = 0.2) on three datasets.
图10. (A) 由MC dropout(𝛼 = 0.2)和我们的MC-ASM提供的不确定性图样本,以及对应的误差图。(B) 在三个数据集上进行的定量比较:(1) 不使用MC-ASM或MC dropout的MambaMIR(对照组),(2) 使用MC-ASM的MambaMIR,以及(3) 使用MC Dropout(𝛼 = 0.2)的MambaMIR。
Fig. 11. Visualised samples of ground truth images with pathology annotation,annotated reconstruction error maps, and uncertainty maps provided by our MC-ASMon pathology cases in the FastMRI+ dataset.
图11. 在FastMRI+数据集中的病理案例上,可视化的真实值图像与病理注释、标注的重建误差图以及由我们的MC-ASM提供的不确定性图。
Table
表
Table 1Quantitative results for comparisonal studies for fast MRI, sparse-view CT (SVCT) and low-dose PET (LDPET) reconstruction. For fast MRI, experiments are performed on FastMRI ataccelerate factor (AF) ×4, ×8, as well as SKMTEA at AF ×8, ×16. For SVCT, experiments are conducted on the abdomen and chest subsets from Low-Dose CT Image and ProjectionDatasets. For LDPET, experiments are conducted on in-house dataset with dose reduction factor (DRF) ×3, ×6. The best scores are indicated by bold.
表1快速MRI、稀视CT(SVCT)和低剂量PET(LDPET)重建的定量比较结果。对于快速MRI,实验在FastMRI数据集上进行,使用加速因子(AF)×4、×8,以及在SKMTEA上进行加速因子(AF)×8、×16的实验。对于SVCT,实验在低剂量CT图像和投影数据集中的腹部和胸部子集上进行。对于LDPET,实验在内部数据集上进行,使用剂量减少因子(DRF)×3、×6。最佳分数用粗体表示。
Table 2Ablation studies for model component validity conducted on FastMRI at AF × 8. witha patch size of 2. Structural Similarity Index Measure (SSIM) and the number ofparameter (#PARAMs) are reported to reflect the performance and model size. ‘FULL’:standard MambaMIR; WAMSS: Wavelet-embedded Arbitrary-Masked State Space Blocks;WDown/WUp: Wavelet-based downsampling/upsampling modules; MLP: Multilayerperceptrons in AMSS Blocks; Attn.: multi-head self-attention modules in the bottleneck;WRSTB: Wavelet Residual Swin Transformer Block.
表2 在FastMRI数据集上使用加速因子(AF)×8和补丁大小为2的模型组件有效性消融研究。报告了结构相似性指数(SSIM)和参数数量(#PARAMs),用于反映性能和模型大小。‘FULL’:标准MambaMIR;WAMSS:波形嵌入的任意遮罩状态空间块;WDown/WUp:基于波形的下采样/上采样模块;MLP:AMSS模块中的多层感知器;Attn.:瓶颈中的多头自注意力模块;WRSTB:波形残差Swin Transformer块。