DACG：一种用于放射学报告生成的双重注意力与上下文引导模型|文献速递-生成式模型与transformer在医学影像中的应用

Oldlee · 2024年11月25日

Title

题目

DACG: Dual Attention and Context Guidance model for radiology reportgeneration

DACG：一种用于放射学报告生成的双重注意力与上下文引导模型

01

文献速递介绍

生成放射学报告需要经过培训且经验丰富的放射科医生仔细检查影像的各个组成部分，并撰写相应的报告。即使对于经验丰富的医生，完成一份报告通常也需要平均5分钟或更长时间。因此，放射学报告的自动生成作为一种有效的辅助工具，减轻了放射科医生繁重的工作负担，并在近年来引起了研究兴趣（Li等，2018；Jing等，2017，2020；Wang等，2021；Anderson等，2018；Liu等，2021）。

为了生成更准确的放射学报告，大多数现有方法遵循图像描述模块。我们使用预训练的卷积神经网络（如ResNet-101）来初步提取图像特征，作为整个模型的初始输入特征。随后，利用递归或非递归神经网络生成报告。

尽管基于深度学习的方法在自动化放射学报告生成方面取得了良好的成果，但仍然存在若干需要解决的挑战（Wang等，2018；Karpathy和Fei-Fei，2015；Jing等，2017）。

Aastract

摘要

Medical images are an essential basis for radiologists to write radiology reports and greatly help subsequentclinical treatment. The task of generating automatic radiology reports aims to alleviate the burden of clinicaldoctors writing reports and has received increasing attention this year, becoming an important researchhotspot. However, there are severe issues of visual and textual data bias and long text generation in themedical field. Firstly, Abnormal areas in radiological images only account for a small portion, and mostradiological reports only involve descriptions of normal findings. Secondly, there are still significant challengesin generating longer and more accurate descriptive texts for radiology report generation tasks. In this paper, wepropose a new Dual Attention and Context Guidance (DACG) model to alleviate visual and textual data bias andpromote the generation of long texts. We use a Dual Attention Module, including a Position Attention Block anda Channel Attention Block, to extract finer position and channel features from medical images, enhancing theimage feature extraction ability of the encoder. We use the Context Guidance Module to integrate contextualinformation into the decoder and supervise the generation of long texts. The experimental results show that ourproposed model achieves state-of-the-art performance on the most commonly used IU X-ray and MIMIC-CXRdatasets. Further analysis also proves that our model can improve reporting through more accurate anomalydetection and more detailed descriptions.

医学影像是放射科医生撰写放射学报告的重要基础，对于后续临床治疗具有重要帮助。自动生成放射学报告的任务旨在减轻临床医生撰写报告的负担，这一领域近年来备受关注，已成为重要的研究热点。然而，在医学领域中，视觉和文本数据偏差以及长文本生成面临着严重挑战。首先，放射学图像中的异常区域仅占很小一部分，而大多数放射学报告主要涉及正常发现的描述。其次，在生成更长且更准确的描述性文本方面仍然存在显著挑战。

为此，本文提出了一种新的双重注意力与上下文引导（DACG）模型，旨在缓解视觉和文本数据偏差，同时促进长文本的生成。我们采用了双重注意力模块（Dual Attention Module），包括位置注意力块（Position Attention Block）和通道注意力块（Channel Attention Block），以从医学影像中提取更细致的位置和通道特征，从而增强编码器的图像特征提取能力。我们还使用上下文引导模块（Context Guidance Module）将上下文信息融入解码器，并对长文本生成进行监督。

实验结果表明，所提出的模型在最常用的IU X-ray和MIMIC-CXR数据集上达到了当前最先进的性能。进一步分析也证明，该模型能够通过更准确的异常检测和更详细的描述来改进报告生成。

Method

方法

Fig. 2 exhibits an overview of DACG, which consists of four chiefcomponents: Visual Feature Extractor, Dual Attention Module, Guidance Memory Generator, and Context-driven Normalization Layer(CNL). We will cover each component in detail in the following sections。

图 2 展示了DACG的总体概览，其由四个主要部分组成：视觉特征提取器（Visual Feature Extractor）、双重注意力模块（Dual Attention Module）、引导记忆生成器（Guidance Memory Generator）以及基于上下文的归一化层（Context-driven Normalization Layer, CNL）。我们将在接下来的章节中详细介绍每个组件。

Conclusion

结论

In this article, we propose the Dual Attention and Context Guidance(DACG) model for generating radiological reports to address two common issues in generating radiological reports: visual and textual databias and long text generation. The Dual Attention Block (DAB) has beenproposed, which enhances medical image features from both positionand channel dimensions compared to traditional CNN and can extractmore subtle and accurate visual feature information. We also useGuidance Memory (GM) to store specific entity description information,continuously update the completeness of contextual information duringtraining, and integrate it into the Context Guidance NormalizationLayer (CNL) to supervise report generation. The experimental resultshave demonstrated that our model achieves the latest performance onpublicly available standard datasets. A series of ablation experimentsand hyperparameter experiments have further demonstrated the synergistic effect of various modules in the DACG model and the rationalityof hyperparameter settings.

本文提出了用于生成放射学报告的双重注意力与上下文引导（DACG）模型，以解决放射学报告生成中的两个常见问题：视觉和文本数据偏差以及长文本生成。我们提出了双重注意力块（Dual Attention Block, DAB），与传统卷积神经网络（CNN）相比，该模块能够从位置和通道两个维度增强医学影像特征，从而提取更细致、更准确的视觉特征信息。同时，我们使用引导记忆（Guidance Memory, GM）存储特定实体描述信息，在训练过程中不断更新上下文信息的完整性，并将其集成到上下文引导归一化层（Context Guidance Normalization Layer, CNL）中以监督报告生成。实验结果表明，我们的模型在公开的标准数据集上达到了最新性能。一系列的消融实验和超参数实验进一步证明了DACG模型中各模块的协同作用以及超参数设置的合理性。

Figure

图

Fig. 1. An example containing two chest X-ray images and corresponding reports

图 1. 一个包含两张胸部X光片及其对应报告的示例

Fig. 2. The overall architecture of DACG. The Visual Feature Extractor, Encoder, and Decoder are shown in gray dashed boxes. The Dual Attention Module, the GM Generator,and the Context-driven Normalization Layer (CNL) are represented in solid gray boxes with blue dash lines.

图 2. DACG 的整体架构。灰色虚线框表示视觉特征提取器、编码器和解码器。实心灰色框内的蓝色虚线框分别表示双重注意力模块（Dual Attention Module）、GM生成器（GM Generator）和基于上下文的归一化层（Context-driven Normalization Layer, CNL）。

Fig. 3. The illustration of the CNL

图 3. CNL（上下文驱动归一化层）的示意图

Fig. 4. The average length of reports generated by BASE, BASE+CGM, BASE+DAM,and DACG on the IU X-ray test set, as well as ground truth

图 4. 在IU X-ray测试集上，BASE、BASE+CGM、BASE+DAM和DACG生成报告的平均长度，以及真实值（ground truth）。

Fig. 5. Effect of varying 𝐻 on (BLEU-4 score).

图 5. 不同𝐻值对BLEU-4分数的影响。

Fig. 6. Reports generated by DACG, BASE+DAM, and BASE+CGM on three samples, as well as ground truth report examples. To better highlight the differences in the reports,different colors highlight different medical terms.

图 6. DACG、BASE+DAM 和 BASE+CGM 在三个样本上生成的报告，以及真实报告示例。为了更好地突出报告中的差异，用不同颜色标注了不同的医学术语。

Table

表

Table 1The statistics of the IU X-ray dataset.

表 1IU X-ray 数据集的统计信息。

Table 2Comparison of the proposed model with those of previous studies in the IU X-raydataset. BL and RG are the abbreviations of BLEU and ROUGE. indicates that theresult is directly cited from the original paper.

表 2在IU X-ray数据集上，提出的模型与之前研究的比较。BL和RG分别是BLEU和ROUGE的缩写。表示结果直接引用自原始论文。

Table 3Comparison of the proposed model with those of previous studies in the MIMIC-CXR dataset. BL and RG are the abbreviationsof BLEU and ROUGE. P is precision, and R is recall. indicates that the result is directly cited from the original paper.

表 3在MIMIC-CXR数据集上，提出的模型与之前研究的比较。BL和RG分别是BLEU和ROUGE的缩写，P表示精确率（Precision），R表示召回率（Recall）。* 表示结果直接引用自原始论文。

Table 4The experimental results of ablation studies on the IU-Xray dataset. The best valuesare highlighted in bold. BL and RG are the abbreviations of BLEU and ROUGE表 4

IU-Xray 数据集上消融研究的实验结果。最佳值以粗体标出。BL 和 RG 分别是 BLEU 和 ROUGE 的缩写。

Table 5The experimental results of DACG models with different numbers of H on the IU X-raydataset for various indicators. The best values are highlighted in bold. BL and RG arethe abbreviations of BLEU and ROUGE.

表 5DACG模型在IU X-ray数据集上使用不同数量的H时针对各种指标的实验结果。最佳值以粗体标出。BL 和 RG 分别是 BLEU 和 ROUGE 的缩写。