X-ray Image Prohibited Item Detection Algorithm Based on X-ray-RTDETR

doi:10.12068/j.issn.1005-3026.2025.20230341

Abstract

Abstract:

In response to the problem of low detection precision caused by inconsistent size， high background noise， and large-scale changes in X-ray image prohibited item， the optimization is performed based on RT-DETR-R18 and an X-ray image prohibited item detection algorithm named X-ray-RTDETR is proposed. Firstly， the algorithm employs CSPRepResNet embedded with efficient multi-scale attention as the backbone network to enhance feature extraction capabilities. Secondly， the simplified fast spatial pyramid pooling module is introduced after the three features maps output by the backbone network to improve the robustness and generalization ability of the model. Finally， the SPoolFormer encoder is applied to high-level feature maps with richer semantic concepts for intra-scale feature interaction. The experimental results show that the detection accuracy of X-ray-RTDETR achieves 74.6% on PIDray test set， surpassing RT-DETR-R18 by 8.5%， while reducing the number of parameters and n_FLOP by 1.67×10⁶ and 2.24×10⁹， respectively. Compared to the state-of-the-art object detection algorithms at the same scale shows that X-ray-RTDETR not only has higher detection accuracy， but also has less number of parameters and n_FLOP. At the same time， its inference speed reaches 85.47 frames per second on RTX2070 Max-Q GPU.

Key words: prohibited item detection, multi-scale attention, feature extraction, pyramid pooling, SPoolFormer encoder

CLC Number:

TP 391.4

Li-zhen LI, Shu-hua MA, Ze-xu GUO, Xiao-chen CHE. X-ray Image Prohibited Item Detection Algorithm Based on X-ray-RTDETR[J]. Journal of Northeastern University(Natural Science), 2025, 46(6): 8-15.

Figures/Tables 8

References 28

[1]	Girshick R， Donahue J， Darrell T， et al. Rich feature hierarchies for accurate object detection and semantic segmentation［C］ // 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus： IEEE， 2014： 580-587.
[2]	Girshick R. Fast R-CNN［C］ // 2015 IEEE International Conference on Computer Vision. Santiago： IEEE， 2015： 1440-1448.
[3]	Ren S Q， He K M， Girshick R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）： 1137-1149.
[4]	He K M， Gkioxari G， Dollár P， et al. Mask R-CNN［C］ // 2017 IEEE International Conference on Computer Vision. Venice： IEEE， 2017： 2980-2988.
[5]	Gaus Y F A， Bhowmik N， Breckon T P. On the use of deep learning for the detection of firearms in X-ray baggage security imagery［C］ // 2019 IEEE International Symposium on Technologies for Homeland Security（HST）. Woburn： IEEE， 2019： 1-7.
[6]	Ma C J， Zhuo L， Li J F， et al. Prohibited object detection in X-ray images with dynamic deformable convolution and adaptive IoU［C］ // 2022 IEEE International Conference on Image Processing（ICIP）. Bordeaux： IEEE， 2022： 3001-3005.
[7]	Liao H Y， Huang B， Gao H X. Feature-aware prohibited items detection for X-ray images［C］ // 2023 IEEE International Conference on Image Processing（ICIP）. Kuala Lumpur： IEEE， 2023： 1040-1044.
[8]	Liu W， Anguelov D， Erhan D， et al. SSD： single shot multibox detector［C］ // 2016 European Conference on Computer Vision （ECCV）. Berlin： Springer， 2016： 21-37.
[9]	Redmon J， Farhadi A. YOLOv3： an incremental improvement［EB/OL］. （2018-04-08）［2023-10-19］. .
[10]	Bochkovskiy A， Wang C Y， Liao H M. YOLOv4： optimal speed and accuracy of object detection［EB/OL］. （2020-04-23）［2023-10-19］. .
[11]	Li C Y， Li L L， Jiang H L， et al. YOLOv6： a single-stage object detection framework for industrial applications［EB/OL］. （2022-09-07）［2023-10-19］. .
[12]	Li C Y， Li L L， Geng Y F， et al. YOLOv6 v3.0： a full-scale reloading［EB/OL］. （2023-01-13）［2023-10-19］. .
[13]	Wang C Y， Bochkovskiy A， Liao H M. YOLOv7： trainable bag-of-freebies sets new state-of-the-art for real-time object detectors［C］ // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR）. Vancouver： IEEE， 2023： 7464-7475.
[14]	Wei Y J， Dai C， Chen M S， et al. Prohibited items detection in X-ray images in YOLO network［C］ // 2021 26th International Conference on Automation and Computing（ICAC）. Portsmouth： IEEE， 2021： 1-6.
[15]	Wang Z S， Zhang H Y， Lin Z B， et al. Prohibited items detection in baggage security based on improved YOLOv5［C］ // 2022 IEEE 2nd International Conference on Software Engineering and Artificial Intelligence（SEAI）. Xiamen， 2022： 20-25.
[16]	Liu W， Sun D G， Wang Y， et al. ABTD-Net： autonomous baggage threat detection networks for X-ray images［C］ // 2023 IEEE International Conference on Multimedia and Expo（ICME）. Brisbane： IEEE， 2023： 1229-1234.
[17]	Carion N， Massa F， Synnaeve G， et al. End-to-end object detection with Transformers［C］ // 2020 European Conference on Computer Vision（ECCV）. Cham： Springer， 2020： 213-229.
[18]	Zhu X Z， Su W J， Lu L W， et al. Deformable DETR： deformable Transformers for end-to-end object detection［EB/OL］. （2021-03-18）［2023-10-19］. .
[19]	Meng D P， Chen X K， Fan Z J， et al. Conditional DETR for fast training convergence［C］ // 2021 IEEE/CVF International Conference on Computer Vision（ICCV）. Montreal： IEEE， 2021： 3651-3660．
[20]	Li F， Zhang H， Liu S L， et al. DN-DETR： accelerate DETR training by introducing query denoising［C］ // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR）. New Orleans： IEEE， 2022： 13609-13617.
[21]	Zhang H， Li F， Liu S L， et al. DINO： DETR with improved denoising anchor boxes for end-to-end object detection［C］ // The 11th International Conference on Learning Representations. Kigali， 2023： 1-19.
[22]	Zhao Y A， Lyu W Y， Xu S L， et al. DETRs beat YOLOs on real-time object detection［C］ // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR）. Seattle： IEEE， 2024： 16965-16974.
[23]	Ouyang D L， He S， Zhang G Z， et al. Efficient multi-scale attention module with cross-spatial learning［C］ // 2023 IEEE International Conference on Acoustics， Speech and Signal Processing（ICASSP）. Rhodes Island， Greece： IEEE， 2023： 1-5.
[24]	Xu S L， Wang X X， Lyu W Y， et al. PP-YOLOE： an evolved version of YOLO［EB/OL］. （2022-12-12）［2023-10-19］. .
[25]	He K M， Zhang X Y， Ren S Q， et al. Spatial pyramid pooling in deep convolutional networks for visual recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2015， 37（9）： 1904-1916.
[26]	Yu W H， Luo M， Zhou P， et al. MetaFormer is actually what you need for vision［C］ // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR）. New Orleans： IEEE， 2022： 10809-10819.
[27]	Stergiou A， Poppe R， Kalliatakis G. Refining activation downsampling with SoftPool［C］ // 2021 IEEE/CVF International Conference on Computer Vision（ICCV）. Montreal： IEEE， 2021： 10337-10346.
[28]	Wang B Y， Zhang L B， Wen L Y， et al. Towards real-world prohibited item detection： a large-scale X-ray benchmark［C］ // 2021 IEEE/CVF International Conference on Computer Vision（ICCV）. Montreal： IEEE， 2021： 5392-5401.

模型	基准模型及其改进	AP₅₀ / %				AP / %				参数量× 10^-6	n_FLOP×10^-9
模型	基准模型及其改进	简单子集	困难子集	隐藏子集	全部	简单子集	困难子集	隐藏子集	全部	参数量× 10^-6	n_FLOP×10^-9
A	RT-DETR-R18	86.2	86.9	61.3	78.1	76.8	72.7	48.9	66.1	20.09	29.03
B	A+(R18→CSPRepResNet)	89.2	89.0	73.8	84.0	80.5	75.1	60.6	72.1	18.33	25.60
C	B+(ESE→EMA)	90.0	90.2	75.6	85.3	81.1	75.9	61.2	72.8	17.95	25.76
D	C+(Conv(1×1)→SimSPPF)	91.0	90.5	76.6	86.0	82.4	76.5	62.6	73.8	18.69	26.79
E	D+(MHSA→SoftPool)	91.4	91.2	77.3	86.6	82.9	77.4	63.4	74.6	18.42	26.79

模型	基准模型及其改进	AP₅₀ / %				AP / %				参数量× 10^-6	n_FLOP×10^-9
模型	基准模型及其改进	简单子集	困难子集	隐藏子集	全部	简单子集	困难子集	隐藏子集	全部	参数量× 10^-6	n_FLOP×10^-9
A	RT-DETR-R18	86.2	86.9	61.3	78.1	76.8	72.7	48.9	66.1	20.09	29.03
B	A+(R18→CSPRepResNet)	89.2	89.0	73.8	84.0	80.5	75.1	60.6	72.1	18.33	25.60
C	B+(ESE→EMA)	90.0	90.2	75.6	85.3	81.1	75.9	61.2	72.8	17.95	25.76
D	C+(Conv(1×1)→SimSPPF)	91.0	90.5	76.6	86.0	82.4	76.5	62.6	73.8	18.69	26.79
E	D+(MHSA→SoftPool)	91.4	91.2	77.3	86.6	82.9	77.4	63.4	74.6	18.42	26.79

模型	AP₅₀/%				AP/%				参数量×10^-6	n_FLOP×10^-9	推理速度
	简单子集	困难子集	隐藏子集	全部	简单子集	困难子集	隐藏子集	全部			推理速度
	简单子集	困难子集	隐藏子集	全部	简单子集	困难子集	隐藏子集	全部			帧·s^-1
YOLOv5-m	83.4	86.0	61.4	76.9	69.2	64.8	44.5	59.5	20.92	27.15	54.95
YOLOv7-l	85.3	87.6	72.6	81.8	75.8	71.2	57.6	68.2	36.54	59.13	58.14
YOLOv8-m	88.0	88.1	73.7	83.3	79.8	75.8	61.7	72.4	25.85	44.44	59.17
PP-YOLOE-Plus-m	90.0	89.1	70.7	83.3	81.0	75.4	57.8	71.4	23.52	27.83	56.18
Gold-YOLO-m	87.2	89.6	72.2	83.0	77.3	73.5	57.1	69.3	41.28	49.12	79.36
YOLOv6-m 3.0	90.1	90.8	75.2	85.4	81.0	76.7	61.8	73.2	34.81	48.18	90.09
X-ray-RTDETR	91.4	91.2	77.3	86.6	82.9	77.4	63.4	74.6	18.42	26.79	85.47

模型	AP₅₀/%				AP/%				参数量×10^-6	n_FLOP×10^-9	推理速度
	简单子集	困难子集	隐藏子集	全部	简单子集	困难子集	隐藏子集	全部			推理速度
	简单子集	困难子集	隐藏子集	全部	简单子集	困难子集	隐藏子集	全部			帧·s^-1
YOLOv5-m	83.4	86.0	61.4	76.9	69.2	64.8	44.5	59.5	20.92	27.15	54.95
YOLOv7-l	85.3	87.6	72.6	81.8	75.8	71.2	57.6	68.2	36.54	59.13	58.14
YOLOv8-m	88.0	88.1	73.7	83.3	79.8	75.8	61.7	72.4	25.85	44.44	59.17
PP-YOLOE-Plus-m	90.0	89.1	70.7	83.3	81.0	75.4	57.8	71.4	23.52	27.83	56.18
Gold-YOLO-m	87.2	89.6	72.2	83.0	77.3	73.5	57.1	69.3	41.28	49.12	79.36
YOLOv6-m 3.0	90.1	90.8	75.2	85.4	81.0	76.7	61.8	73.2	34.81	48.18	90.09
X-ray-RTDETR	91.4	91.2	77.3	86.6	82.9	77.4	63.4	74.6	18.42	26.79	85.47