Title
题目
ViT-AIR: Puzzling vision transformer-based affine image registration formulti histopathology and faxitron images of breast tissue
PViT-AIR:基于视觉转换器的乳腺组织多组学和Faxitron图像仿射配准
01
文献速递介绍
乳腺癌是全球公共卫生的一个重大问题,也是女性中最常见的癌症,导致显著的发病率和死亡率(美国癌症学会,2023)。研究估计,八分之一的女性在其一生中将患上侵袭性乳腺癌(乳腺癌,2023),并预计在2022年将有287,850例新病例和43,250例死亡(Giaquinto等,2022)。乳腺癌的诊断通常采用核心针活检,一旦确认诊断,根据肿瘤病理可考虑不同的治疗方案(国家乳腺癌基金会,2022)。其中一种治疗选择是手术联合术前全身治疗(新辅助治疗),研究表明,这种方案可改善治疗效果,尤其是在HER2阳性、三阴性(雌激素受体/孕激素受体/HER2)以及/或淋巴结阳性的病例中(Britt等,2020)。手术后,切除的标本将进行病理学检查,以确定肿瘤的大小、分级、分期以及残留癌组织的边缘状态。这些信息用于指导后续治疗决策,包括是否需要额外的手术或全身治疗。
在许多机构,包括我们医院,病理处理涉及对切除手术组织(乳房切除术或乳腺保留手术)的原始切片进行影像学处理(通过传真X射线(faxitron)放射线)。对于较大的标本,通常只提交一部分组织进行组织学处理和显微镜评估。选择适当的切片进行组织学检查是一个关键步骤,它有助于确定残留肿瘤的存在、大小和边缘状态,以及癌症是否表现出治疗效应。这些特征对于规划下一步治疗至关重要。手动选择适当切片的过程是一项繁重且资源密集的任务,严重依赖病理学家的经验和判断,这可能引入潜在的变异性。切除的组织中如果有治疗后的癌症,可能不再显示明显的肿块,在这种情况下,残留肿瘤可能会被遗漏。这最终会导致不准确的诊断和不恰当的治疗计划(Yousif等,2022)。此外,特别是在新辅助化疗后,准确识别癌症的范围和位置可能会很困难,因为化疗可以改变肿瘤微环境,给病理图像的视觉评估带来挑战。这可能导致处理的延迟,并需要额外的后续切片(Sha等,2019;Lester,2010)。这些限制凸显了开发自动化方法来识别和选择适合组织学检查的高效切片的必要性。通过自动化在传真X射线影像上定位残留肿瘤或肿瘤床,有可能提高效率和准确性,从而缩短周转时间并提高诊断准确性(Acs等,2020)。然而,目前的自动化方法尚未能够区分传真X射线影像上的残留肿瘤与反应性基质变化,这突显了这一领域进一步研究的需求。
生成标注训练数据集对于开发和训练精确的深度学习算法以实现传真X射线影像上的自动癌症检测至关重要。这个过程需要通过图像配准,准确地将肿瘤的范围从组织病理图像映射到相应的传真X射线影像。然而,由于传真X射线和组织病理图像的内容和分辨率差异,准确的图像配准非常具有挑战性。传真X射线影像显示的是整个切片的X射线影像,而组织病理图像是5微米厚的切片,来自经过福尔马林固定和石蜡包埋的组织块。这样的差异使得精确的图像配准变得困难,可能导致配准错误和数据的误解(Gurcan等,2009)。此外,还有一些因素进一步复杂化了传真X射线和组织病理图像的对准,包括组织在固定和处理过程中变形、两种图像类型中的伪影、组织的不同取向以及由于传真X射线影像上的组织样本估计不准确而导致的图像之间的不精确对应(Madabhushi和Lee,2016)。
Aastract
摘要
Breast cancer is a significant global public health concern, with various treatment options available basedon tumor characteristics. Pathological examination of excision specimens after surgery provides essentialinformation for treatment decisions. However, the manual selection of representative sections for histologicalexamination is laborious and subjective, leading to potential sampling errors and variability, especially incarcinomas that have been previously treated with chemotherapy. Furthermore, the accurate identificationof residual tumors presents significant challenges, emphasizing the need for systematic or assisted methodsto address this issue. In order to enable the development of deep-learning algorithms for automated cancerdetection on radiology images, it is crucial to perform radiology-pathology registration, which ensures thegeneration of accurately labeled ground truth data. The alignment of radiology and histopathology imagesplays a critical role in establishing reliable cancer labels for training deep-learning algorithms on radiologyimages. However, aligning these images is challenging due to their content and resolution differences, tissuedeformation, artifacts, and imprecise correspondence. We present a novel deep learning-based pipeline for theaffine registration of faxitron images, the x-ray representations of macrosections of ex-vivo breast tissue, andtheir corresponding histopathology images of tissue segments. The proposed model combines convolutionalneural networks and vision transformers, allowing it to effectively capture both local and global informationfrom the entire tissue macrosection as well as its segments. This integrated approach enables simultaneousregistration and stitching of image segments, facilitating segment-to-macrosection registration through apuzzling-based mechanism. To address the limitations of multi-modal ground truth data, we tackle the problemby training the model using synthetic mono-modal data in a weakly supervised manner. The trained modeldemonstrated successful performance in multi-modal registration, yielding registration results with an averagelandmark error of 1.51 mm (±2.40), and stitching distance of 1.15 mm (±0.94). The results indicate that themodel performs significantly better than existing baselines, including both deep learning-based and iterativemodels, and it is also approximately 200 times faster than the iterative approach. This wrk bridges the gap inthe current research and clinical workflow and has the potential to improve efficiency and accuracy in breastcancer evaluation and streamline pathology workflow.
乳腺癌是全球公共卫生的重要问题,根据肿瘤特征提供多种治疗选择。手术后对切除标本的病理检查为治疗决策提供了重要信息。然而,手动选择具有代表性的切片进行组织学检查既繁琐又主观,容易导致取样误差和变异性,尤其是在接受过化疗的癌症患者中。此外,准确识别残留肿瘤也面临显著挑战,这突显了需要系统化或辅助方法来解决这一问题。为了开发用于自动化癌症检测的深度学习算法,进行放射学和病理学图像配准至关重要,这有助于生成准确标注的真实数据。放射学和病理学图像的对齐在为深度学习算法提供可靠的癌症标签方面起着关键作用。然而,由于图像内容和分辨率的差异、组织变形、伪影以及不精确的对应关系,图像配准面临着诸多挑战。本文提出了一种基于深度学习的新的仿射配准管道,用于对Faxitron图像(即切除的乳腺组织大切片的X射线图像)与其对应的病理学图像进行配准。所提出的模型结合了卷积神经网络和视觉转换器,能够有效地从整个组织大切片及其各个小段中捕获局部和全局信息。这种集成方法通过基于拼图的机制实现了图像段的同时配准和拼接,从而促进了小段到大切片的配准。为了解决多模态真实数据的限制,我们通过在弱监督方式下使用合成单模态数据训练模型来解决这一问题。训练后的模型在多模态配准任务中表现出色,平均标志误差为1.51毫米(±2.40),拼接距离为1.15毫米(±0.94)。结果表明,所提模型的表现显著优于现有的基准模型,包括基于深度学习和迭代方法的模型,其速度也比迭代方法快约200倍。该研究弥补了当前研究和临床工作流程中的空白,有望提高乳腺癌评估的效率和准确性,并简化病理学工作流程。
Method
方法
This study, which received approval from the Institutional ReviewBoard at Stanford University, involves the analysis of data obtainedfrom 100 women who underwent neoadjuvant chemotherapy and thensurgical excision. The excised breast specimens were inked for orientation and sectioned at uniform distances following the standardclinical protocol of the pathology laboratory. The distance betweenmacrosections was determined by the size of the excised tissue andranged from 3 mm for lumpectomy specimens up to approximately1 cm for mastectomy specimens. The macrosections were imaged usinga faxitron that generated x-ray radiographs with an image size of3440 × 3440 pixels, providing a comprehensive ex-vivo representationof the excised tissue. Depending on the specimen size, the entire tissueexcision or specific areas of interest were submitted for histologicalexamination. Then, digital hematoxylin and eosin (H&E) images withan image size of 4600 × 6000 pixels were acquired for the tissuesegments with their approximate location annotated as labeled boxregions of interest (Box-ROIs) on the corresponding faxitron imageby the pathologists. All faxitron and histopathology images are inRGB format. Moreover, a breast subspecialty pathologist reviewed thehistopathology H&E slides and annotated the extent of invasive breastcarcinoma (IBC) and ductal carcinoma in situ (DCIS). Note that somemacrosections underwent histological analysis of partial areas, leadingto a limited number of histopathology images compared to the faxitronimages.
本研究获得了斯坦福大学伦理委员会的批准,涉及分析来自100名接受新辅助化疗并随后进行手术切除的女性患者的数据。切除的乳腺标本进行了标记以便定向,并按照病理实验室的标准临床协议进行了均匀间距的切片。宏观切片之间的距离由切除组织的大小决定,对于乳房保留手术(乳腺切除术)标本的切片间距为3毫米,而乳腺全切除术标本的切片间距约为1厘米。这些宏观切片使用传真X射线机(faxitron)进行成像,生成大小为3440 × 3440像素的X射线影像,提供切除组织的全面体外图像。根据标本的大小,整个组织切除标本或特定感兴趣区域被提交进行组织学检查。然后,数字化的苏木精-伊红(H&E)图像(图像大小为4600 × 6000像素)被采集,用于标注切片区域,并由病理学家在相应的传真X射线影像上标注大致位置作为标记的框区域(Box-ROIs)。所有传真X射线和组织病理图像均为RGB格式。此外,一名乳腺专业病理学家对组织病理H&E切片进行了审查,并标注了侵袭性乳腺癌(IBC)和导管原位癌(DCIS)的范围。需要注意的是,某些宏观切片仅对部分区域进行了组织学分析,导致组织病理图像的数量少于传真X射线图像。
Conclusion
结论
Accurate registration plays a crucial role in aligning histopathologyimages with corresponding faxitron radiographs, enabling the mappingof breast cancer extent and subtype. This alignment has significantpotential for enhancing the interpretation of breast faxitron images andproviding labeled data for developing and validating breast cancer detection models on faxitron images using machine learning techniques.We introduced PViT-AIR, the first approach to utilize deep learningtechniques for the 2D registration of faxitron and histopathology images of breast tissue obtained from cancer patients who underwentsurgery. Our proposed model operates as a puzzle solver, efficientlyregistering multiple histopathology image segments with their corresponding macrosection faxitron image in a single inference step. Byintegrating convolutional neural networks and vision transformers, ourmodel can extract local and global information from the entire tissuemacrosection and its segments, allowing for simultaneous registrationand stitching of image segments. We utilized a weakly-supervisedtraining approach, employing a synthetic mono-modal training dataset,to enable the model to perform faxitron and breast histopathologyimage registration without relying on multi-modal ground-truth data.The experimental results demonstrated the promising performance ofour model on multi-modal test data, demonstrating superior accuracy and speed compared to existing state-of-the-art registration techniques, such as deep learning and iterative methods. The proposedapproach streamlines the multi-segment registration process, enablingreal-time interactive registration. The precise registration achieved inthis study enables the accurate mapping of ground truth information, such as the size and focality of residual breast cancer, fromhistopathology images to their corresponding faxitron images. Thisadvancement enhances performance and improves the accessibility oflabeled data, thereby promoting the development and validation ofmachine-learning approaches for localizing residual tumor on faxitronimages, thus streamlining the pathology workflow.
精确配准在对组织病理学图像与相应的Faxitron放射图像进行对齐中起着至关重要的作用,从而实现乳腺癌范围和亚型的映射。这种对齐在增强对乳腺Faxitron图像的解读以及为开发和验证基于机器学习技术的乳腺癌检测模型提供标注数据方面具有重要潜力。我们提出了PViT-AIR,这是首个利用深度学习技术实现乳腺癌患者术后获取的Faxitron与组织病理学图像进行二维配准的方法。我们所提出的模型以拼图求解器的方式运行,可在单次推理中高效地配准多个组织病理学图像段与其对应的大视野Faxitron图像。通过结合卷积神经网络和视觉Transformer,我们的模型能够从整个组织大视野及其片段中提取局部和全局信息,从而实现图像片段的同时配准和拼接。
我们采用弱监督训练方法,利用合成单模态训练数据集,使模型能够在无需依赖多模态真实标注数据的情况下完成Faxitron与乳腺组织病理学图像的配准。实验结果表明,模型在多模态测试数据上的表现令人满意,展示了相比现有的深度学习和迭代方法等先进配准技术的优越精度与速度。所提出的方法简化了多片段配准流程,实现了实时交互式配准。研究中实现的精确配准能够将组织病理学图像中的真实信息(如残余乳腺癌的大小和局灶性)准确映射到相应的Faxitron图像上。这一进展不仅提升了性能,还改善了标注数据的获取,进一步促进了用于在Faxitron图像上定位残余肿瘤的机器学习方法的开发与验证,从而简化了病理学工作流程。
Results
结果
To evaluate the performance of our proposed PViT-AIR model inmulti-modal image registration, a comparative analysis was conductedagainst existing methods. The evaluated methods included an iterativetechnique implemented in SimpleITK (Lowekamp et al., 2013; Yanivet al., 2018), CNNGeometric (Rocco et al., 2017), a deep learningnetwork for estimating affine transformations on natural images, andBreastRegNet, our previous CNN-based registration network (Golestaniet al., 2024). The performance of each method was assessed andcompared in terms of several evaluation metrics.It is important to note that all models were evaluated on sampleswith varying numbers of moving images, ranging from 1 to 6 segments.The reported results encompass registration outcomes for these diversecases. The baseline approaches performed segment-to-segment registration, with the registered images subsequently stitched post-registration.In contrast, the proposed method processed all inputs simultaneously,producing a final registered and stitched image.All registration methods, including our PViT-AIR model and baselines, use the same preprocessed images as input for the registration,ensuring consistent preprocessing effects across all approaches. Specifically, we used the manually assigned gross rotation in lieu of theautomatically identified one to ensure that we perform a careful evaluation of the four registration models alone with limited influence fromthe preprocessing steps . To assess the impact ofgross rotation preprocessing on the performance of PViT-AIR model,we conducted two additional experiments. One focused on examiningthe effect of deviating from the manually assigned gross rotation, andanother one focused on using deep learning networks to identify grossrotation for a fully automated pipeline, referred to as PViT-AIR+, withevaluation shown in Section 4.3. Furthermore, results from manualregistration of faxitron and histopathology images were included forcomprehensive comparison (Section 4.4). Finally, we performed ablation studies to evaluate the utility of various components in thePViT-AIR model .
为了评估我们提出的PViT-AIR模型在多模态图像配准中的性能,进行了与现有方法的比较分析。评估的方法包括在SimpleITK中实现的迭代技术(Lowekamp等,2013;Yaniv等,2018)、CNNGeometric(Rocco等,2017),这是一个用于估计自然图像仿射变换的深度学习网络,以及BreastRegNet,我们之前基于CNN的配准网络(Golestani等,2024)。每种方法的性能通过多个评估指标进行了评估和比较。
需要注意的是,所有模型都在具有不同数量的移动图像(从1个到6个切片)的样本上进行了评估。报告的结果涵盖了这些不同情况的配准结果。基准方法执行了切片到切片的配准,配准后的图像随后进行了拼接。与此不同,提出的方法同时处理所有输入,生成最终的配准和拼接图像。
所有的配准方法,包括我们的PViT-AIR模型和基准方法,使用相同的预处理图像作为配准的输入,确保了所有方法之间预处理效果的一致性。具体而言,我们使用了手动指定的粗略旋转,而不是自动识别的旋转,以确保我们仅对四种配准模型进行仔细评估,减少了预处理步骤的影响。为了评估粗略旋转预处理对PViT-AIR模型性能的影响,我们进行了两个附加实验。一个实验重点研究偏离手动指定粗略旋转的效果,另一个实验则集中在使用深度学习网络识别粗略旋转,以实现完全自动化的管道,称为PViT-AIR+,评估结果见第4.3节。此外,还包含了传真X射线和组织病理图像的手动配准结果,以进行全面的比较。最后,我们进行了消融实验,评估PViT-AIR模型中各个组件的作用。
Figure
图
Fig. 1. Data preprocessing pipeline for histopathology and faxitron images.
图.1.用于病理学和费歇尔透射电镜图像的数据预处理流水线。
Fig. 2. PViT-AIR network architecture for estimating affine transformation parameters (𝜽) of input data with multiple moving images and generating the composite image.
图2. PViT-AIR网络架构,用于估计输入数据的仿射变换参数(𝜽),该输入数据包含多个移动图像,并生成合成图像。
Fig. 3. Qualitative comparison on multiple strategies to register histopathology images corresponding to the superior and inferior halves of a breast tissue slice with the faxitronimage of the entire section. Our proposed PViT-AIR approach achieves precise alignment of image borders and seamless stitching of histopathology images without any overlap orgaps (green arrow). In contrast, the iterative approach (SimpleITK) and CNNGeometric fail to close the gap between histopathology images, while BreastRegNet results in tissueoverlap (blue arrow). PViT-AIR+ represents the fully automated version.
图3. 多种策略的定性比较,展示了将乳腺组织切片的上半部分和下半部分的组织病理图像与整个切片的传真X射线影像进行配准的效果。我们提出的PViT-AIR方法能够精确对齐图像边缘,并无缝拼接组织病理图像,且没有重叠或间隙(绿色箭头)。相比之下,迭代方法(SimpleITK)和CNNGeometric未能消除组织病理图像之间的间隙,而BreastRegNet则导致组织重叠(蓝色箭头)。PViT-AIR+表示完全自动化的版本。
Fig. 4. Visual assessment of the PViT-AIR registered histopathology and faxitron images for a typical case (average landmark error: 1.12 mm, average stitching distance: 0.69 mm).PViT-AIR generates robust alignments and closes the gap between histopathology images in all macrosections available for this study
图4. 对PViT-AIR配准的组织病理图像和传真X射线图像进行的视觉评估,展示了一个典型案例(平均标志点误差:1.12毫米,平均拼接距离:0.69毫米)。PViT-AIR生成了稳健的对齐,并成功消除了所有本研究中可用宏观切片之间的组织病理图像的间隙。
Fig. 5. PViT-AIR approach is significantly better than the alternative approaches in terms of mean landmark error (MLE), stitching distance measure (SDM), and Dice coefficient.SS: statistically significant (𝑝-value ≤ 0.01), NS: not significant (𝑝-value > 0.01).
图5. 在平均标志点误差(MLE)、拼接距离度量(SDM)和Dice系数方面,PViT-AIR方法显著优于其他方法。 SS: 统计学上显著(𝑝值 ≤ 0.01),NS: 不显著(𝑝值 > 0.01)。
Fig. 6. Comparative analysis of registration accuracy for the proposed PViT-AIR modelunder varied initial gross rotation between input images, where a gross rotation of zerorepresents manually preprocessed and corrected images rather than the final registeredimages.
图6. 提出了PViT-AIR模型在不同初始粗略旋转条件下的配准准确性比较分析,其中粗略旋转为零表示手动预处理和校正后的图像,而非最终配准后的图像。
Table
表
Table 1Quantitative registration results between the proposed method and other approaches.
表1 所提方法与其他方法的定量配准结果对比。