| [1] | Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C] // 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 580-587. | 
																													
																						| [2] | Girshick R. Fast R-CNN[C] // 2015 IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1440-1448. | 
																													
																						| [3] | Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. | 
																													
																						| [4] | He K M, Gkioxari G, Dollár P, et al. Mask R-CNN[C] // 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2980-2988. | 
																													
																						| [5] | Gaus Y F A, Bhowmik N, Breckon T P. On the use of deep learning for the detection of firearms in X-ray baggage security imagery[C] // 2019 IEEE International Symposium on Technologies for Homeland Security(HST). Woburn: IEEE, 2019: 1-7. | 
																													
																						| [6] | Ma C J, Zhuo L, Li J F, et al. Prohibited object detection in X-ray images with dynamic deformable convolution and adaptive IoU[C] // 2022 IEEE International Conference on Image Processing(ICIP). Bordeaux: IEEE, 2022: 3001-3005. | 
																													
																						| [7] | Liao H Y, Huang B, Gao H X. Feature-aware prohibited items detection for X-ray images[C] // 2023 IEEE International Conference on Image Processing(ICIP). Kuala Lumpur: IEEE, 2023: 1040-1044. | 
																													
																						| [8] | Liu W, Anguelov D, Erhan D, et al. SSD: single shot multibox detector[C] // 2016 European Conference on Computer Vision (ECCV). Berlin: Springer, 2016: 21-37. | 
																													
																						| [9] | Redmon J, Farhadi A. YOLOv3: an incremental improvement[EB/OL]. (2018-04-08) [2023-10-19]. . | 
																													
																						| [10] | Bochkovskiy A, Wang C Y, Liao H M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. (2020-04-23) [2023-10-19]. . | 
																													
																						| [11] | Li C Y, Li L L, Jiang H L, et al. YOLOv6: a single-stage object detection framework for industrial applications[EB/OL]. (2022-09-07) [2023-10-19]. . | 
																													
																						| [12] | Li C Y, Li L L, Geng Y F, et al. YOLOv6 v3.0: a full-scale reloading[EB/OL]. (2023-01-13) [2023-10-19]. . | 
																													
																						| [13] | Wang C Y, Bochkovskiy A, Liao H M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Vancouver: IEEE, 2023: 7464-7475. | 
																													
																						| [14] | Wei Y J, Dai C, Chen M S, et al. Prohibited items detection in X-ray images in YOLO network[C] // 2021 26th International Conference on Automation and Computing(ICAC). Portsmouth: IEEE, 2021: 1-6. | 
																													
																						| [15] | Wang Z S, Zhang H Y, Lin Z B, et al. Prohibited items detection in baggage security based on improved YOLOv5[C] // 2022 IEEE 2nd International Conference on Software Engineering and Artificial Intelligence(SEAI). Xiamen, 2022: 20-25. | 
																													
																						| [16] | Liu W, Sun D G, Wang Y, et al. ABTD-Net: autonomous baggage threat detection networks for X-ray images[C] // 2023 IEEE International Conference on Multimedia and Expo(ICME). Brisbane: IEEE, 2023: 1229-1234. | 
																													
																						| [17] | Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with Transformers[C] // 2020 European Conference on Computer Vision(ECCV). Cham: Springer, 2020: 213-229. | 
																													
																						| [18] | Zhu X Z, Su W J, Lu L W, et al. Deformable DETR: deformable Transformers for end-to-end object detection[EB/OL]. (2021-03-18) [2023-10-19]. . | 
																													
																						| [19] | Meng D P, Chen X K, Fan Z J, et al. Conditional DETR for fast training convergence[C] // 2021 IEEE/CVF International Conference on Computer Vision(ICCV). Montreal: IEEE, 2021: 3651-3660. | 
																													
																						| [20] | Li F, Zhang H, Liu S L, et al. DN-DETR: accelerate DETR training by introducing query denoising[C] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). New Orleans: IEEE, 2022: 13609-13617. | 
																													
																						| [21] | Zhang H, Li F, Liu S L, et al. DINO: DETR with improved denoising anchor boxes for end-to-end object detection[C] // The 11th International Conference on Learning Representations. Kigali, 2023: 1-19. | 
																													
																						| [22] | Zhao Y A, Lyu W Y, Xu S L, et al. DETRs beat YOLOs on real-time object detection[C] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Seattle: IEEE, 2024: 16965-16974. | 
																													
																						| [23] | Ouyang D L, He S, Zhang G Z, et al. Efficient multi-scale attention module with cross-spatial learning[C] // 2023 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Rhodes Island, Greece: IEEE, 2023: 1-5. | 
																													
																						| [24] | Xu S L, Wang X X, Lyu W Y, et al. PP-YOLOE: an evolved version of YOLO[EB/OL]. (2022-12-12) [2023-10-19]. . | 
																													
																						| [25] | He K M, Zhang X Y, Ren S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. | 
																													
																						| [26] | Yu W H, Luo M, Zhou P, et al. MetaFormer is actually what you need for vision[C] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). New Orleans: IEEE, 2022: 10809-10819. | 
																													
																						| [27] | Stergiou A, Poppe R, Kalliatakis G. Refining activation downsampling with SoftPool[C] // 2021 IEEE/CVF International Conference on Computer Vision(ICCV). Montreal: IEEE, 2021: 10337-10346. | 
																													
																						| [28] | Wang B Y, Zhang L B, Wen L Y, et al. Towards real-world prohibited item detection: a large-scale X-ray benchmark[C] // 2021 IEEE/CVF International Conference on Computer Vision(ICCV). Montreal: IEEE, 2021: 5392-5401. |