ViT-AIR: Puzzling vision transformer-based affine image registration formulti histopathology and faxitron images of breast tissue
Breast cancer is a significant global public health concern, with various treatment options available basedon tumor characteristics. Pathological examination of excision specimens after surgery provides essentialinformation for treatment decisions. However, the manual selection of representative sections for histologicalexamination is laborious and subjective, leading to potential sampling errors and variability, especially incarcinomas that have been previously treated with chemotherapy. Furthermore, the accurate identificationof residual tumors presents significant challenges, emphasizing the need for systematic or assisted methodsto address this issue. In order to enable the development of deep-learning algorithms for automated cancerdetection on radiology images, it is crucial to perform radiology-pathology registration, which ensures thegeneration of accurately labeled ground truth data. The alignment of radiology and histopathology imagesplays a critical role in establishing reliable cancer labels for training deep-learning algorithms on radiologyimages. However, aligning these images is challenging due to their content and resolution differences, tissuedeformation, artifacts, and imprecise correspondence. We present a novel deep learning-based pipeline for theaffine registration of faxitron images, the x-ray representations of macrosections of ex-vivo breast tissue, andtheir corresponding histopathology images of tissue segments. The proposed model combines convolutionalneural networks and vision transformers, allowing it to effectively capture both local and global informationfrom the entire tissue macrosection as well as its segments. This integrated approach enables simultaneousregistration and stitching of image segments, facilitating segment-to-macrosection registration through apuzzling-based mechanism. To address the limitations of multi-modal ground truth data, we tackle the problemby training the model using synthetic mono-modal data in a weakly supervised manner. The trained modeldemonstrated successful performance in multi-modal registration, yielding registration results with an averagelandmark error of 1.51 mm (±2.40), and stitching distance of 1.15 mm (±0.94). The results indicate that themodel performs significantly better than existing baselines, including both deep learning-based and iterativemodels, and it is also approximately 200 times faster than the iterative approach. This wrk bridges the gap inthe current research and clinical workflow and has the potential to improve efficiency and accuracy in breastcancer evaluation and streamline pathology workflow.
This study, which received approval from the Institutional ReviewBoard at Stanford University, involves the analysis of data obtainedfrom 100 women who underwent neoadjuvant chemotherapy and thensurgical excision. The excised breast specimens were inked for orientation and sectioned at uniform distances following the standardclinical protocol of the pathology laboratory. The distance betweenmacrosections was determined by the size of the excised tissue andranged from 3 mm for lumpectomy specimens up to approximately1 cm for mastectomy specimens. The macrosections were imaged usinga faxitron that generated x-ray radiographs with an image size of3440 × 3440 pixels, providing a comprehensive ex-vivo representationof the excised tissue. Depending on the specimen size, the entire tissueexcision or specific areas of interest were submitted for histologicalexamination. Then, digital hematoxylin and eosin (H&E) images withan image size of 4600 × 6000 pixels were acquired for the tissuesegments with their approximate location annotated as labeled boxregions of interest (Box-ROIs) on the corresponding faxitron imageby the pathologists. All faxitron and histopathology images are inRGB format. Moreover, a breast subspecialty pathologist reviewed thehistopathology H&E slides and annotated the extent of invasive breastcarcinoma (IBC) and ductal carcinoma in situ (DCIS). Note that somemacrosections underwent histological analysis of partial areas, leadingto a limited number of histopathology images compared to the faxitronimages.
本研究获得了斯坦福大学伦理委员会的批准,涉及分析来自100名接受新辅助化疗并随后进行手术切除的女性患者的数据。切除的乳腺标本进行了标记以便定向,并按照病理实验室的标准临床协议进行了均匀间距的切片。宏观切片之间的距离由切除组织的大小决定,对于乳房保留手术(乳腺切除术)标本的切片间距为3毫米,而乳腺全切除术标本的切片间距约为1厘米。这些宏观切片使用传真X射线机(faxitron)进行成像,生成大小为3440 × 3440像素的X射线影像,提供切除组织的全面体外图像。根据标本的大小,整个组织切除标本或特定感兴趣区域被提交进行组织学检查。然后,数字化的苏木精-伊红(H&E)图像(图像大小为4600 × 6000像素)被采集,用于标注切片区域,并由病理学家在相应的传真X射线影像上标注大致位置作为标记的框区域(Box-ROIs)。所有传真X射线和组织病理图像均为RGB格式。此外,一名乳腺专业病理学家对组织病理H&E切片进行了审查,并标注了侵袭性乳腺癌(IBC)和导管原位癌(DCIS)的范围。需要注意的是,某些宏观切片仅对部分区域进行了组织学分析,导致组织病理图像的数量少于传真X射线图像。
Accurate registration plays a crucial role in aligning histopathologyimages with corresponding faxitron radiographs, enabling the mappingof breast cancer extent and subtype. This alignment has significantpotential for enhancing the interpretation of breast faxitron images andproviding labeled data for developing and validating breast cancer detection models on faxitron images using machine learning techniques.We introduced PViT-AIR, the first approach to utilize deep learningtechniques for the 2D registration of faxitron and histopathology images of breast tissue obtained from cancer patients who underwentsurgery. Our proposed model operates as a puzzle solver, efficientlyregistering multiple histopathology image segments with their corresponding macrosection faxitron image in a single inference step. Byintegrating convolutional neural networks and vision transformers, ourmodel can extract local and global information from the entire tissuemacrosection and its segments, allowing for simultaneous registrationand stitching of image segments. We utilized a weakly-supervisedtraining approach, employing a synthetic mono-modal training dataset,to enable the model to perform faxitron and breast histopathologyimage registration without relying on multi-modal ground-truth data.The experimental results demonstrated the promising performance ofour model on multi-modal test data, demonstrating superior accuracy and speed compared to existing state-of-the-art registration techniques, such as deep learning and iterative methods. The proposedapproach streamlines the multi-segment registration process, enablingreal-time interactive registration. The precise registration achieved inthis study enables the accurate mapping of ground truth information, such as the size and focality of residual breast cancer, fromhistopathology images to their corresponding faxitron images. Thisadvancement enhances performance and improves the accessibility oflabeled data, thereby promoting the development and validation ofmachine-learning approaches for localizing residual tumor on faxitronimages, thus streamlining the pathology workflow.
To evaluate the performance of our proposed PViT-AIR model inmulti-modal image registration, a comparative analysis was conductedagainst existing methods. The evaluated methods included an iterativetechnique implemented in SimpleITK (Lowekamp et al., 2013; Yanivet al., 2018), CNNGeometric (Rocco et al., 2017), a deep learningnetwork for estimating affine transformations on natural images, andBreastRegNet, our previous CNN-based registration network (Golestaniet al., 2024). The performance of each method was assessed andcompared in terms of several evaluation metrics.It is important to note that all models were evaluated on sampleswith varying numbers of moving images, ranging from 1 to 6 segments.The reported results encompass registration outcomes for these diversecases. The baseline approaches performed segment-to-segment registration, with the registered images subsequently stitched post-registration.In contrast, the proposed method processed all inputs simultaneously,producing a final registered and stitched image.All registration methods, including our PViT-AIR model and baselines, use the same preprocessed images as input for the registration,ensuring consistent preprocessing effects across all approaches. Specifically, we used the manually assigned gross rotation in lieu of theautomatically identified one to ensure that we perform a careful evaluation of the four registration models alone with limited influence fromthe preprocessing steps . To assess the impact ofgross rotation preprocessing on the performance of PViT-AIR model,we conducted two additional experiments. One focused on examiningthe effect of deviating from the manually assigned gross rotation, andanother one focused on using deep learning networks to identify grossrotation for a fully automated pipeline, referred to as PViT-AIR+, withevaluation shown in Section 4.3. Furthermore, results from manualregistration of faxitron and histopathology images were included forcomprehensive comparison (Section 4.4). Finally, we performed ablation studies to evaluate the utility of various components in thePViT-AIR model .
Fig. 1. Data preprocessing pipeline for histopathology and faxitron images.
Fig. 2. PViT-AIR network architecture for estimating affine transformation parameters (𝜽) of input data with multiple moving images and generating the composite image.
图2. PViT-AIR网络架构,用于估计输入数据的仿射变换参数(𝜽),该输入数据包含多个移动图像,并生成合成图像。
Fig. 3. Qualitative comparison on multiple strategies to register histopathology images corresponding to the superior and inferior halves of a breast tissue slice with the faxitronimage of the entire section. Our proposed PViT-AIR approach achieves precise alignment of image borders and seamless stitching of histopathology images without any overlap orgaps (green arrow). In contrast, the iterative approach (SimpleITK) and CNNGeometric fail to close the gap between histopathology images, while BreastRegNet results in tissueoverlap (blue arrow). PViT-AIR+ represents the fully automated version.
图3. 多种策略的定性比较,展示了将乳腺组织切片的上半部分和下半部分的组织病理图像与整个切片的传真X射线影像进行配准的效果。我们提出的PViT-AIR方法能够精确对齐图像边缘,并无缝拼接组织病理图像,且没有重叠或间隙(绿色箭头)。相比之下,迭代方法(SimpleITK)和CNNGeometric未能消除组织病理图像之间的间隙,而BreastRegNet则导致组织重叠(蓝色箭头)。PViT-AIR+表示完全自动化的版本。
Fig. 4. Visual assessment of the PViT-AIR registered histopathology and faxitron images for a typical case (average landmark error: 1.12 mm, average stitching distance: 0.69 mm).PViT-AIR generates robust alignments and closes the gap between histopathology images in all macrosections available for this study
图4. 对PViT-AIR配准的组织病理图像和传真X射线图像进行的视觉评估,展示了一个典型案例(平均标志点误差:1.12毫米,平均拼接距离:0.69毫米)。PViT-AIR生成了稳健的对齐,并成功消除了所有本研究中可用宏观切片之间的组织病理图像的间隙。
Fig. 5. PViT-AIR approach is significantly better than the alternative approaches in terms of mean landmark error (MLE), stitching distance measure (SDM), and Dice coefficient.SS: statistically significant (𝑝-value ≤ 0.01), NS: not significant (𝑝-value > 0.01).
图5. 在平均标志点误差(MLE)、拼接距离度量(SDM)和Dice系数方面,PViT-AIR方法显著优于其他方法。 SS: 统计学上显著(𝑝值 ≤ 0.01),NS: 不显著(𝑝值 > 0.01)。
Fig. 6. Comparative analysis of registration accuracy for the proposed PViT-AIR modelunder varied initial gross rotation between input images, where a gross rotation of zerorepresents manually preprocessed and corrected images rather than the final registeredimages.
图6. 提出了PViT-AIR模型在不同初始粗略旋转条件下的配准准确性比较分析,其中粗略旋转为零表示手动预处理和校正后的图像,而非最终配准后的图像。
Table 1Quantitative registration results between the proposed method and other approaches.
表1 所提方法与其他方法的定量配准结果对比。