PtbNet: Based on Local Few-Shot Classes and Small Objects to accurately detect
This section focuses on our method’s components and each part’s design. Overall, our study consists of two aspects, as shown in Fig 2. On the one hand, it is for the processing of DCRs, i.e., to improve in data scale by geometric augmentation and generative augmentation, with an emphasis on explaining the processing of data generation. On the other hand, we redesigned the structure of BiFPN and SPD-Conv and constructed a new multi-lesion detector PtbNet for PTB based on the RetinaNet to achieve the unified detection of few-shot and non-few-shot classes.
Early diagnosis and treatment of PTB are crucial in preventing deaths caused by infectious diseases worldwide. In this study, we deeply analyzed the potential problems of the PTB dataset, effectively increased the size and diversity of the DCRs with PTB. The impact of PTB lesion classes imbalance on the detector performance was reduced by image-level data augmentation and embedding a diffusion probabilistic model with category conditions. On baseline pre-evaluation experiments, the results demonstrated the positive effects of the two data augmentation methods in facilitating the accuracy detection of PTB. Importantly, we optimized and improved the adaptability of the BiFPN and SPD-Conv components to input images of arbitrary resolution and propose a novel multilesion detector PtbNet for PTB (which can be used to detect other similar lesions). In the comprehensive assessment experiment, we need to explain the following phenomena that PtbNet achieves higher precision but performs poorly in terms of recall. According to [44], it has been shown that there is a marvelous equilibrium between precision and recall in the Table VI. In response to the lower recall of our method, our model predicts the positive samples more accurately and tends to miss a portion of the true positive samples, thus leading to this phenomenon. However, this has instead led to a reduction in the false positive performance of PtbNet at the detection, which has more important implications for clinical diagnosis.
Fig. 1. Visualization of training samples for similar phenotypes. Orange rectangles indicate lesion areas. Red arrows indicate locations of intensive similarity.
图1. 类似表型训练样本的可视化。橙色矩形表示病变区域。红色箭头指示密集相似性的位置。
Fig. 2. The overall architecture of our model PtbNet. A: Processing of DCRs with PTB. B: Generative data augmentation with DDPM for few-shotclasses. C: The core building blocks of the PtbNet. D: Three-dimensional principles of PSPD-Conv when κ=2. E: Schematic diagram for BiFPN+ .
图2. 我们模型PtbNet的总体架构。A:处理带有PTB的DCR。B:使用DDPM进行少样本类别的生成数据增强。C:PtbNet的核心构建模块。D:当κ=2时,PSPD-Conv的三维原理。E:BiFPN+的示意图。
Fig. 3. Interpretation of DDPM integrated into category conditions. Theyellow arrow indicates forward processes; the blue arrow indicates reverse processes. The green diamond indicates the category conditions added to the forward and reverse processes.
图3. 将DDPM集成到类别条件中的解释。黄色箭头表示前向过程;蓝色箭头表示反向过程。绿色菱形表示添加到前向和反向过程中的类别条件。
Fig. 4. Two-dimensional interpretation of PSPD by zero padding in four cases of feature maps. Note: Even Number (EN); Odd Number (ON).
图4. PSPD的二维解释,通过在四种特征图的零填充来展示。注意:偶数(EN);奇数(ON)。
Fig. 5. Statistical analysis before and after image-level data augmentation. A: Number distribution of categories before and after image-level data augmentation. B: Distribution of DCR resolution and lesion area resolution in image-level data augmentation. Blue dots represent DCRs; Yellow rectangular dots represent lesion areas.
图5. 图像级数据增强前后的统计分析。
Fig. 6. Real samples of few-shot classes and corresponding generated samples in DCRs.
图6. 少样本类别的真实样本和对应生成的DCR样本。
Fig. 7. Representation of the generated patch on enhanced DCRs. The dotted box is where the PATCH is located, and the solid box is the area of the lesion.
图7. 在增强的DCR上生成补丁的表示。虚线框表示补丁的位置,实线框表示病变区域的范围。
Fig. 8. Using the idea of 10-fold cross-validation, the training set and test set are divided into 7:3.
图8. 使用10折交叉验证的思想,将训练集和测试集按7:3划分。
Fig. 9. Visualization results of the top-5 performing methods on DCRs. Green boxes and black words indicate real labels, yellow boxes indicate correct predictions, red boxes and comments indicate incorrect predictions.
图9. 在DCR上展示排名前五的方法的可视化结果。绿色框和黑色字表示真实标签,黄色框表示正确预测,红色框和评论表示错误预测。
Fig.10. Feature activation maps under combining different components or feature fusion networks for the no-few-shot and few-shot classes.The red area indicates the area where the model pays more attention.
图10. 不同组件或特征融合网络下的特征激活图,适用于非少样本和少样本类别。红色区域表示模型注意力较高的区域。
Fig. 11. Loss vs. Epoch for different combinations during training.
图11. 不同组合的训练损失随Epoch的变化图。
Fig.12. The impact of generated DCRs with different numbers on the performance of PtbNet.
图 12. 不同数量生成的DCR对PtbNet性能的影响。
TABLE I labels and actual meaning in ptb dataset
表格 IPTB数据集中的标签和实际含义
TABLE Ⅱ the number of small,medium and large objects after image-level data augmentation and the number of dcrs in which ther are located.
表格 II 经图像级数据增强后的小、中、大目标数量及其所在DCR的数量
TABLE Ⅲ the size of dataset and number of instnces of each class before and afer generative augment .
表格 III生成增强前后数据集大小及每个类别实例数量
TABLE Ⅳevaluation of orignal,basic,and large datasets using baseline with pre-tre-tre-trained ResNet-50 and ResNet-101 by COCO API indicates that a pre-trained model is used ((R50=RESNET-50; R101=RESNET-101)
表格 IV 使用COCO API评估原始、基础和大型数据集在预训练的ResNet-50和ResNet-101上的结果。
TABLE Ⅴ evaluation of few-shot classes the original dataset,the basic dataset,and the large dataset,respectively,using the base method with ResNet-50+FPN
表格 V使用ResNet-50+FPN基准方法评估原始数据集、基础数据集和大型数据集上的少样本类别。
TABLE VI comparison of our method with the other methods. indicates that a pre-traind weights of the backbone network.
表VI 本方法与其他方法的比较。表示预先训练好的后端网络权重。
TABLE VII 安定 error rates for the top-5 methods
表格 VII 排名前五方法的检测率和错误率
TABLE Ⅷ effects of PSPD-Conv and bifpn+in ablation experiments。✔represents integrated components. represents a pre-trained model.(R50=RESNET-50)
表格 VIII PSPD-Conv和BiFPN+在消融实验中的效果。✔表示集成组件。表示使用预训练模型。(R50=RESNET-50)