基于X-ray-RTDETR的X射线图像违禁品检测算法

doi:10.12068/j.issn.1005-3026.2025.20230341

摘要/Abstract

摘要：

针对X射线违禁品图像大小不一致、背景噪声高和尺度变化大导致检测精度低的问题，在RT-DETR-R18的基础上进行优化，提出了X射线图像违禁品检测算法X-ray-RTDETR.该算法首先使用嵌入高效多尺度注意力的CSPRepResNet作为主干网络增强特征提取能力；其次，在主干网络输出的3个特征图之后引入简化的快速空间金字塔池化模块提高模型的鲁棒性和泛化能力；最后，将SPoolFormer编码器应用于语义概念更丰富的高级特征图进行尺度内特征交互.实验结果表明，X-ray-RTDETR在PIDray测试集上检测精度达到了74.6%，比RT-DETR-R18提升了8.5%，参数量和浮点操作次数n_FLOP分别减少了1.67×10⁶，2.24×10⁹.与当前最先进的同量级目标检测算法实验对比结果表明，X-ray-RTDETR不仅检测精度更高，而且参数量与n_FLOP也更少，同时推理速度在RTX2070 Max-Q GPU上达到了85.47 帧/s.

关键词: 违禁品检测, 多尺度注意力, 特征提取, 金字塔池化, SPoolFormer编码器

Abstract:

In response to the problem of low detection precision caused by inconsistent size， high background noise， and large-scale changes in X-ray image prohibited item， the optimization is performed based on RT-DETR-R18 and an X-ray image prohibited item detection algorithm named X-ray-RTDETR is proposed. Firstly， the algorithm employs CSPRepResNet embedded with efficient multi-scale attention as the backbone network to enhance feature extraction capabilities. Secondly， the simplified fast spatial pyramid pooling module is introduced after the three features maps output by the backbone network to improve the robustness and generalization ability of the model. Finally， the SPoolFormer encoder is applied to high-level feature maps with richer semantic concepts for intra-scale feature interaction. The experimental results show that the detection accuracy of X-ray-RTDETR achieves 74.6% on PIDray test set， surpassing RT-DETR-R18 by 8.5%， while reducing the number of parameters and n_FLOP by 1.67×10⁶ and 2.24×10⁹， respectively. Compared to the state-of-the-art object detection algorithms at the same scale shows that X-ray-RTDETR not only has higher detection accuracy， but also has less number of parameters and n_FLOP. At the same time， its inference speed reaches 85.47 frames per second on RTX2070 Max-Q GPU.

Key words: prohibited item detection, multi-scale attention, feature extraction, pyramid pooling, SPoolFormer encoder

中图分类号:

TP 391.4

李立振, 马淑华, 郭泽旭, 车晓辰. 基于X-ray-RTDETR的X射线图像违禁品检测算法[J]. 东北大学学报（自然科学版）, 2025, 46(6): 8-15.

Li-zhen LI, Shu-hua MA, Ze-xu GUO, Xiao-chen CHE. X-ray Image Prohibited Item Detection Algorithm Based on X-ray-RTDETR[J]. Journal of Northeastern University(Natural Science), 2025, 46(6): 8-15.

图/表 8

参考文献 28

[1]	Girshick R， Donahue J， Darrell T， et al. Rich feature hierarchies for accurate object detection and semantic segmentation［C］ // 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus： IEEE， 2014： 580-587.
[2]	Girshick R. Fast R-CNN［C］ // 2015 IEEE International Conference on Computer Vision. Santiago： IEEE， 2015： 1440-1448.
[3]	Ren S Q， He K M， Girshick R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）： 1137-1149.
[4]	He K M， Gkioxari G， Dollár P， et al. Mask R-CNN［C］ // 2017 IEEE International Conference on Computer Vision. Venice： IEEE， 2017： 2980-2988.
[5]	Gaus Y F A， Bhowmik N， Breckon T P. On the use of deep learning for the detection of firearms in X-ray baggage security imagery［C］ // 2019 IEEE International Symposium on Technologies for Homeland Security（HST）. Woburn： IEEE， 2019： 1-7.
[6]	Ma C J， Zhuo L， Li J F， et al. Prohibited object detection in X-ray images with dynamic deformable convolution and adaptive IoU［C］ // 2022 IEEE International Conference on Image Processing（ICIP）. Bordeaux： IEEE， 2022： 3001-3005.
[7]	Liao H Y， Huang B， Gao H X. Feature-aware prohibited items detection for X-ray images［C］ // 2023 IEEE International Conference on Image Processing（ICIP）. Kuala Lumpur： IEEE， 2023： 1040-1044.
[8]	Liu W， Anguelov D， Erhan D， et al. SSD： single shot multibox detector［C］ // 2016 European Conference on Computer Vision （ECCV）. Berlin： Springer， 2016： 21-37.
[9]	Redmon J， Farhadi A. YOLOv3： an incremental improvement［EB/OL］. （2018-04-08）［2023-10-19］. .
[10]	Bochkovskiy A， Wang C Y， Liao H M. YOLOv4： optimal speed and accuracy of object detection［EB/OL］. （2020-04-23）［2023-10-19］. .
[11]	Li C Y， Li L L， Jiang H L， et al. YOLOv6： a single-stage object detection framework for industrial applications［EB/OL］. （2022-09-07）［2023-10-19］. .
[12]	Li C Y， Li L L， Geng Y F， et al. YOLOv6 v3.0： a full-scale reloading［EB/OL］. （2023-01-13）［2023-10-19］. .
[13]	Wang C Y， Bochkovskiy A， Liao H M. YOLOv7： trainable bag-of-freebies sets new state-of-the-art for real-time object detectors［C］ // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR）. Vancouver： IEEE， 2023： 7464-7475.
[14]	Wei Y J， Dai C， Chen M S， et al. Prohibited items detection in X-ray images in YOLO network［C］ // 2021 26th International Conference on Automation and Computing（ICAC）. Portsmouth： IEEE， 2021： 1-6.
[15]	Wang Z S， Zhang H Y， Lin Z B， et al. Prohibited items detection in baggage security based on improved YOLOv5［C］ // 2022 IEEE 2nd International Conference on Software Engineering and Artificial Intelligence（SEAI）. Xiamen， 2022： 20-25.
[16]	Liu W， Sun D G， Wang Y， et al. ABTD-Net： autonomous baggage threat detection networks for X-ray images［C］ // 2023 IEEE International Conference on Multimedia and Expo（ICME）. Brisbane： IEEE， 2023： 1229-1234.
[17]	Carion N， Massa F， Synnaeve G， et al. End-to-end object detection with Transformers［C］ // 2020 European Conference on Computer Vision（ECCV）. Cham： Springer， 2020： 213-229.
[18]	Zhu X Z， Su W J， Lu L W， et al. Deformable DETR： deformable Transformers for end-to-end object detection［EB/OL］. （2021-03-18）［2023-10-19］. .
[19]	Meng D P， Chen X K， Fan Z J， et al. Conditional DETR for fast training convergence［C］ // 2021 IEEE/CVF International Conference on Computer Vision（ICCV）. Montreal： IEEE， 2021： 3651-3660．
[20]	Li F， Zhang H， Liu S L， et al. DN-DETR： accelerate DETR training by introducing query denoising［C］ // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR）. New Orleans： IEEE， 2022： 13609-13617.
[21]	Zhang H， Li F， Liu S L， et al. DINO： DETR with improved denoising anchor boxes for end-to-end object detection［C］ // The 11th International Conference on Learning Representations. Kigali， 2023： 1-19.
[22]	Zhao Y A， Lyu W Y， Xu S L， et al. DETRs beat YOLOs on real-time object detection［C］ // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR）. Seattle： IEEE， 2024： 16965-16974.
[23]	Ouyang D L， He S， Zhang G Z， et al. Efficient multi-scale attention module with cross-spatial learning［C］ // 2023 IEEE International Conference on Acoustics， Speech and Signal Processing（ICASSP）. Rhodes Island， Greece： IEEE， 2023： 1-5.
[24]	Xu S L， Wang X X， Lyu W Y， et al. PP-YOLOE： an evolved version of YOLO［EB/OL］. （2022-12-12）［2023-10-19］. .
[25]	He K M， Zhang X Y， Ren S Q， et al. Spatial pyramid pooling in deep convolutional networks for visual recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2015， 37（9）： 1904-1916.
[26]	Yu W H， Luo M， Zhou P， et al. MetaFormer is actually what you need for vision［C］ // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR）. New Orleans： IEEE， 2022： 10809-10819.
[27]	Stergiou A， Poppe R， Kalliatakis G. Refining activation downsampling with SoftPool［C］ // 2021 IEEE/CVF International Conference on Computer Vision（ICCV）. Montreal： IEEE， 2021： 10337-10346.
[28]	Wang B Y， Zhang L B， Wen L Y， et al. Towards real-world prohibited item detection： a large-scale X-ray benchmark［C］ // 2021 IEEE/CVF International Conference on Computer Vision（ICCV）. Montreal： IEEE， 2021： 5392-5401.

模型	基准模型及其改进	AP₅₀ / %				AP / %				参数量× 10^-6	n_FLOP×10^-9
模型	基准模型及其改进	简单子集	困难子集	隐藏子集	全部	简单子集	困难子集	隐藏子集	全部	参数量× 10^-6	n_FLOP×10^-9
A	RT-DETR-R18	86.2	86.9	61.3	78.1	76.8	72.7	48.9	66.1	20.09	29.03
B	A+(R18→CSPRepResNet)	89.2	89.0	73.8	84.0	80.5	75.1	60.6	72.1	18.33	25.60
C	B+(ESE→EMA)	90.0	90.2	75.6	85.3	81.1	75.9	61.2	72.8	17.95	25.76
D	C+(Conv(1×1)→SimSPPF)	91.0	90.5	76.6	86.0	82.4	76.5	62.6	73.8	18.69	26.79
E	D+(MHSA→SoftPool)	91.4	91.2	77.3	86.6	82.9	77.4	63.4	74.6	18.42	26.79

模型	基准模型及其改进	AP₅₀ / %				AP / %				参数量× 10^-6	n_FLOP×10^-9
模型	基准模型及其改进	简单子集	困难子集	隐藏子集	全部	简单子集	困难子集	隐藏子集	全部	参数量× 10^-6	n_FLOP×10^-9
A	RT-DETR-R18	86.2	86.9	61.3	78.1	76.8	72.7	48.9	66.1	20.09	29.03
B	A+(R18→CSPRepResNet)	89.2	89.0	73.8	84.0	80.5	75.1	60.6	72.1	18.33	25.60
C	B+(ESE→EMA)	90.0	90.2	75.6	85.3	81.1	75.9	61.2	72.8	17.95	25.76
D	C+(Conv(1×1)→SimSPPF)	91.0	90.5	76.6	86.0	82.4	76.5	62.6	73.8	18.69	26.79
E	D+(MHSA→SoftPool)	91.4	91.2	77.3	86.6	82.9	77.4	63.4	74.6	18.42	26.79

模型	AP₅₀/%				AP/%				参数量×10^-6	n_FLOP×10^-9	推理速度
	简单子集	困难子集	隐藏子集	全部	简单子集	困难子集	隐藏子集	全部			推理速度
	简单子集	困难子集	隐藏子集	全部	简单子集	困难子集	隐藏子集	全部			帧·s^-1
YOLOv5-m	83.4	86.0	61.4	76.9	69.2	64.8	44.5	59.5	20.92	27.15	54.95
YOLOv7-l	85.3	87.6	72.6	81.8	75.8	71.2	57.6	68.2	36.54	59.13	58.14
YOLOv8-m	88.0	88.1	73.7	83.3	79.8	75.8	61.7	72.4	25.85	44.44	59.17
PP-YOLOE-Plus-m	90.0	89.1	70.7	83.3	81.0	75.4	57.8	71.4	23.52	27.83	56.18
Gold-YOLO-m	87.2	89.6	72.2	83.0	77.3	73.5	57.1	69.3	41.28	49.12	79.36
YOLOv6-m 3.0	90.1	90.8	75.2	85.4	81.0	76.7	61.8	73.2	34.81	48.18	90.09
X-ray-RTDETR	91.4	91.2	77.3	86.6	82.9	77.4	63.4	74.6	18.42	26.79	85.47

模型	AP₅₀/%				AP/%				参数量×10^-6	n_FLOP×10^-9	推理速度
	简单子集	困难子集	隐藏子集	全部	简单子集	困难子集	隐藏子集	全部			推理速度
	简单子集	困难子集	隐藏子集	全部	简单子集	困难子集	隐藏子集	全部			帧·s^-1
YOLOv5-m	83.4	86.0	61.4	76.9	69.2	64.8	44.5	59.5	20.92	27.15	54.95
YOLOv7-l	85.3	87.6	72.6	81.8	75.8	71.2	57.6	68.2	36.54	59.13	58.14
YOLOv8-m	88.0	88.1	73.7	83.3	79.8	75.8	61.7	72.4	25.85	44.44	59.17
PP-YOLOE-Plus-m	90.0	89.1	70.7	83.3	81.0	75.4	57.8	71.4	23.52	27.83	56.18
Gold-YOLO-m	87.2	89.6	72.2	83.0	77.3	73.5	57.1	69.3	41.28	49.12	79.36
YOLOv6-m 3.0	90.1	90.8	75.2	85.4	81.0	76.7	61.8	73.2	34.81	48.18	90.09
X-ray-RTDETR	91.4	91.2	77.3	86.6	82.9	77.4	63.4	74.6	18.42	26.79	85.47