Enhancing global sensitivity and uncertainty quantification in medical imagereconstruction with Monte Carlo arbitrary-masked mamba
医学图像重建是医学影像学中最基本且至关重要的组成部分。高质量和高保真度的重建医学图像确保了后续疾病诊断和治疗规划的精准性和有效性,从而减少了潜在的患者健康风险(Wang等,2020)。磁共振成像(MRI)提供了高分辨率且可重复的评估,无需暴露于辐射中。快速MRI广泛应用于从子奈奎斯特采样的 k 空间测量中生成MR图像,旨在加速本质上较慢的数据采集过程并消除伪影(Liang等,2020;Hammernik等,2023;Huang等,2024b)。X射线计算机断层扫描(CT)虽然能够生成高质量且详细的图像,但涉及到辐射风险。稀视角CT(SVCT)已被开发出来,通过使用较少的投影视角来降低辐射剂量,尽管这样做会引入显著的伪影(Shah和Platt,2008;Pan等,2009)。正电子发射断层扫描(PET)对于理解代谢和功能性身体过程至关重要,通常需要长时间扫描或高剂量才能获得高质量的图像,导致不适感和风险。为了解决这个问题,低剂量PET(LDPET)的开发为提高图像质量而不增加注射剂量提供了一个有前景的方向(Knopp,2020)。
Deep learning has been extensively applied in medical image reconstruction, where Convolutional NeuralNetworks (CNNs) and Vision Transformers (ViTs) represent the predominant paradigms, each possessingdistinct advantages and inherent limitations: CNNs exhibit linear complexity with local sensitivity, whereasViTs demonstrate quadratic complexity with global sensitivity. The emerging Mamba has shown superiority inlearning visual representation, which combines the advantages of linear scalability and global sensitivity. Inthis study, we introduce MambaMIR, an Arbitrary-Masked Mamba-based model with wavelet decompositionfor joint medical image reconstruction and uncertainty estimation. A novel Arbitrary Scan Masking (ASM)mechanism ‘‘masks out’’ redundant information to introduce randomness for further uncertainty estimation.Compared to the commonly used Monte Carlo (MC) dropout, our proposed MC-ASM provides an uncertaintymap without the need for hyperparameter tuning and mitigates the performance drop typically observedwhen applying dropout to low-level tasks. For further texture preservation and better perceptual quality, weemploy the wavelet transformation into MambaMIR and explore its variant based on the Generative AdversarialNetwork, namely MambaMIR-GAN. Comprehensive experiments have been conducted for multiple representative medical image reconstruction tasks, demonstrating that the proposed MambaMIR and MambaMIR-GANoutperform other baseline and state-of-the-art methods in different reconstruction tasks, where MambaMIRachieves the best reconstruction fidelity and MambaMIR-GAN has the best perceptual quality. In addition,our MC-ASM provides uncertainty maps as an additional tool for clinicians, while mitigating the typicalperformance drop caused by the commonly used dropout.
3.1. Medical image reconstruction
The forward acquisition process for medical images is described by:𝐲 = 𝐀𝐱 + 𝐧,(4)where 𝐱 ∈ C𝑛 represents the image of interest, 𝐲 ∈ C𝑚 denotesthe corresponding measurements, and 𝐧 ∈ C𝑚 is the inevitable noiseencountered during the measurement process.
3.1. 医学图像重建
医学图像的前向采集过程可以通过以下公式描述:y=Ax+n\mathbf{y} = \mathbf{A}\mathbf{x} + \mathbf{n}y=Ax+n(4)其中,𝐱 ∈ Cⁿ 表示感兴趣的图像,𝐲 ∈ Cᵐ 表示对应的测量值,𝐧 ∈ Cᵐ 是在测量过程中不可避免的噪声。
In conclusion, our proposed MambaMIR and MambaMIR-GAN represent significant advances in the field of medical image reconstruction.The proposed generalised framework has achieved superior performance on fast MRI, SVCT, and LDPET, which proves its scalability andpotential for other reconstruction applications such as ultrasound orlow-dose CT reconstruction. Our proposed MC-ASM mechanism demonstrates its superiority over the commonly used MC dropout, providingreliable uncertainty estimation without the need for hyperparametertuning, while mitigating the performance drop often seen when usingdropout for low-level tasks.Future studies may investigate the scalability of these models forvarious imaging modalities and their potential for computational efficiency.
总之,我们提出的MambaMIR和MambaMIR-GAN在医学图像重建领域代表了重要的进展。所提出的通用框架在快速MRI、稀视图CT(SVCT)和低剂量PET(LDPET)上的优异表现证明了其可扩展性,并且具有潜力应用于其他重建任务,如超声或低剂量CT重建。我们提出的MC-ASM机制优于常用的MC dropout,能够提供可靠的不确定性估计,而无需调节超参数,同时缓解了在低级任务中使用dropout时常见的性能下降问题。

Fig. 1. (A) Comparison between Convolutional Neural Networks (CNNs), Vision Transformers (ViTs), and VMamba. CNNs and ViTs represent two predominant paradigms, eachpossessing distinct advantages and inherent limitations: CNNs exhibit linear complexity with local sensitivity, whereas ViTs demonstrate quadratic complexity with global sensitivity.The emerging VMamba (Liu et al., 2024) has shown superiority in computer vision tasks, combining the advantages of linear scalability and global sensitivity. (B) Comparisonbetween dropout and the proposed Arbitrary Scan Masking (ASM) mechanism. Dropout requires careful hyperparameter tuning (dropout rate) and typically leads to a performancedrop in low-level tasks, despite its ability to mitigate overfitting in high-level tasks. The proposed ASM mechanism presents a superior alternative to dropout. Instead of randomly‘‘dropping’’ some activations that may be essential for the final outcome, our ASM strategically ‘‘masks out’’ a part of redundant information during training and inference stages
图1. (A) 卷积神经网络(CNN)、视觉变换器(ViT)和VMamba的比较。CNN和ViT分别代表两种主要的范式,各自具有独特的优势和固有的局限性:CNN表现出线性复杂度和局部敏感性,而ViT则表现出二次复杂度和全局敏感性。新兴的VMamba(Liu等,2024)在计算机视觉任务中展现了优越性,结合了线性可扩展性和全局敏感性的优势。(B) dropout和提出的任意扫描遮罩(ASM)机制的比较。Dropout需要仔细调节超参数(dropout率),并且通常会导致低级任务中的性能下降,尽管它能够减轻高层任务中的过拟合。提出的ASM机制提供了dropout的优越替代方案。与随机“丢弃”可能对最终结果至关重要的激活不同,我们的ASM在训练和推理阶段有策略地“遮罩”掉一部分冗余信息。

Fig. 2. (A) The proposed Arbitrary-Masked S6 (AMS6) block. An AMS6 block includes a Scan Expanding module, an Arbitrary Scan Masking module, an S6 module, and a ScanMerging module. (B) Uncertainty estimation with the proposed Arbitrary Scan Masking mechanism during inference. © The framework of the proposed MambaMIR
图2. (A) 提出的任意遮罩S6(AMS6)模块。一个AMS6模块包括扫描扩展模块、任意扫描遮罩模块、S6模块和扫描合并模块。(B) 在推理阶段使用提出的任意扫描遮罩机制进行不确定性估计。© 提出的MambaMIR框架。

Fig. . 3. Visualised results on FastMRI at AF ×4. Ground truth (GT), undersampled zero-filled (ZF) images, reconstruction results and corresponding error maps are presented
图3. 在FastMRI数据集上使用加速因子(AF)×4的可视化结果。展示了真实值(GT)、欠采样零填充(ZF)图像、重建结果以及对应的误差图。

Fig. 4. Visualised results on SKMTEA at AF ×8. Ground truth (GT), undersampled zero-filled (ZF) images, reconstruction results and corresponding error maps are presented.
图4. 在SKMTEA数据集上使用加速因子(AF)×8的可视化结果。展示了真实值(GT)、欠采样零填充(ZF)图像、重建结果以及对应的误差图。

Fig. 5. Visualised results for SVCT on the chest subset. Ground truth (GT), sparse-view images reconstructed by Filtered Backprojection (FBP), reconstruction results andcorresponding error maps are presented. CT images are normalised within the range of [−1024, 3096] HU for error map computation and display
图5. 在胸部子集上进行SVCT的可视化结果。展示了真实值(GT)、通过滤波反投影(FBP)重建的稀视图像、重建结果以及对应的误差图。CT图像在误差图计算和显示时进行了[-1024, 3096] HU范围的归一化处理

Fig. 6. Visualised results for LDPET at DRF ×6. Ground truth (GT), low-dose images, reconstruction results and corresponding error maps are presented.
图6. 在LDPET数据集上使用剂量减少因子(DRF)×6的可视化结果。展示了真实值(GT)、低剂量图像、重建结果以及对应的误差图。

Fig. 7. (A) Ablation studies on hyperparameters regarding the patch size, the randomly cropping resolution during training, and the number of S6’s latent space channels (#Channel);(B) Experiments comparing Mamba-based and Transformer-based models. The size of the data circle and the number below indicate the computational complexity (GFLOPs).
图7. (A) 关于超参数的消融研究,涉及训练过程中随机裁剪分辨率的补丁大小和S6的潜在空间通道数(#Channel);(B) 比较基于Mamba和基于Transformer的模型的实验。数据圆圈的大小和下面的数字表示计算复杂度(GFLOPs)。

Fig. 8. Comparison of Effective Receptive Fields before and after training between theproposed MambaMIR and other methods for SVCT on abdomen subset
图8. 提出的MambaMIR与其他方法在训练前后对于SVCT在腹部子集上的有效感受野比较。

Fig. 9. Quantitative comparison on FastMRI dataset between (1) MambaMIR withoutMC-ASM or MC dropout (control group), (2) MambaMIR with MC-ASM and (3)MambaMIR with MC Dropout using different dropout rates.
图9. 在FastMRI数据集上进行的定量比较:(1) 不使用MC-ASM或MC dropout的MambaMIR(对照组),(2) 使用MC-ASM的MambaMIR,以及(3) 使用不同dropout率的MC Dropout的MambaMIR。

Fig. 10. (A) Visualised samples of uncertainty maps provided by MC dropout (𝛼 = 0.2) and our MC-ASM, along with the corresponding error maps. (B) Quantitative comparisonbetween (1) MambaMIR without MC-ASM or MC dropout (control group), (2) MambaMIR with MC-ASM and (3) MambaMIR with MC Dropout (𝛼 = 0.2) on three datasets.
图10. (A) 由MC dropout(𝛼 = 0.2)和我们的MC-ASM提供的不确定性图样本,以及对应的误差图。(B) 在三个数据集上进行的定量比较:(1) 不使用MC-ASM或MC dropout的MambaMIR(对照组),(2) 使用MC-ASM的MambaMIR,以及(3) 使用MC Dropout(𝛼 = 0.2)的MambaMIR。

Fig. 11. Visualised samples of ground truth images with pathology annotation,annotated reconstruction error maps, and uncertainty maps provided by our MC-ASMon pathology cases in the FastMRI+ dataset.
图11. 在FastMRI+数据集中的病理案例上,可视化的真实值图像与病理注释、标注的重建误差图以及由我们的MC-ASM提供的不确定性图。

Table 1Quantitative results for comparisonal studies for fast MRI, sparse-view CT (SVCT) and low-dose PET (LDPET) reconstruction. For fast MRI, experiments are performed on FastMRI ataccelerate factor (AF) ×4, ×8, as well as SKMTEA at AF ×8, ×16. For SVCT, experiments are conducted on the abdomen and chest subsets from Low-Dose CT Image and ProjectionDatasets. For LDPET, experiments are conducted on in-house dataset with dose reduction factor (DRF) ×3, ×6. The best scores are indicated by bold.

Table 2Ablation studies for model component validity conducted on FastMRI at AF × 8. witha patch size of 2. Structural Similarity Index Measure (SSIM) and the number ofparameter (#PARAMs) are reported to reflect the performance and model size. ‘FULL’:standard MambaMIR; WAMSS: Wavelet-embedded Arbitrary-Masked State Space Blocks;WDown/WUp: Wavelet-based downsampling/upsampling modules; MLP: Multilayerperceptrons in AMSS Blocks; Attn.: multi-head self-attention modules in the bottleneck;WRSTB: Wavelet Residual Swin Transformer Block.
表2 在FastMRI数据集上使用加速因子(AF)×8和补丁大小为2的模型组件有效性消融研究。报告了结构相似性指数(SSIM)和参数数量(#PARAMs),用于反映性能和模型大小。‘FULL’:标准MambaMIR;WAMSS:波形嵌入的任意遮罩状态空间块;WDown/WUp:基于波形的下采样/上采样模块;MLP:AMSS模块中的多层感知器;Attn.:瓶颈中的多头自注意力模块;WRSTB:波形残差Swin Transformer块。