Gesture Recognition in the Complex Environment Based on Gan-St-YOLOv5

doi:10.12068/j.issn.1005-3026.2023.07.006

Abstract

Abstract: During the human-computer interaction of gesture recognition in the complex environment of intelligent industrial production， gesture features are affected by local occlusion， strong illumination and small distant targets， leading to the reduction of gesture features recognized in the process of target detection and recognition， and even classification errors. Given that improving the accuracy of gesture recognition in the complex environment has become an urgent problem to be solved in human-computer interaction tasks， an innovative Gan-St-YOLOv5 model is proposed. On the basis of YOLOv5， GAN and Swin Transformer modules are integrated into SENet channel attention mechanism， and Confluence detection box selection algorithm is used to enhance the accuracy of model detection. In order to verify the superiority of the model， the YOLOv5 model is used for comparison and it is concluded that the mAP_0.5 of Gan-St-YOLOv5 is up to 96.1% on the fully visible test set， as high as 92.3% in the intense illumination test set， as high as 86.6% in the partial occlusion test set， and as high as 96.4% in the remote small target test set， all of which are superior to the YOLOv5 target detection algorithm and achieve higher accuracy with less efficiency loss.

Key words: Gan-St-YOLOv5; gesture recognition; local occlusion; strong illumination; remote small target

CLC Number:

V211.5

HAO Bo， YIN Xing-chao， YAN Jun-wei， ZHANG Li. Gesture Recognition in the Complex Environment Based on Gan-St-YOLOv5[J]. Journal of Northeastern University(Natural Science), 2023, 44(7): 953-963.

References

[1]汤寓麟，李厚朴，张卫东，等.侧扫声纳检测沉船目标的轻量化DETR-YOLO法［J/OL］.系统工程与电子技术.［2022-04-06］.http://kns.cnki.net/kcms/detail/11.2422.TN.20211224.1538.010.html.(Tang Yu-lin，Li Hou-pu，Zhang Wei-dong，et al.A lightweight DETR-YOLO method for detecting shipwreck targets with side-scan sonar［J/OL］.Systems Engineering and Electronics.［2022-04-06］.http://kns.cnki.net/kcms/detail/11.2422.TN.20211224.1538.010.html.)
[2]安珊，林树宽，乔建忠，等.基于生成对抗网络学习被遮挡特征的目标检测方法［J］.控制与决策，2021，36(5):1199-1205.(An Shan，Lin Shu-kuan，Qiao Jian-zhong，et al.Object detection method based on generated adversarial network learning obscured feature［J］.Control and Decision，2021，36(5):1199-1205.)
[3]蔡旻，高涵文，李华一，等.基于CRF和HMM混合模型的手势识别方法［J］.计算机应用与软件，2021，38(11):162-166.(Cai Min，Gao Han-wen，Li Hua-yi，et al.Gesture recognition method based on hybrid model CRF and HMM［J］.Computer Applications and Software，2021，38(11):162-166.)
[4]Tan Y S，Lim K M，Tee C，et al.Convolutional neural network with spatial pyramid pooling for hand gesture recognition［J］.Neural Computing and Applications，2021，33(10):5339-5351.
[5]Sarma D，Bhuyan M K.Hand detection by two-level segmentation with double-tracking and gesture recognition using deep-features［J］.Sensing and Imaging，2022，23(1):1-29.
[6]Manmode P，Saha R，Amnerkar M N .Real-time hand gesture recognition［J］.International Journal of Scientific Research in Computer Science Engineering and Information Technology，2021，35(9):618-624.
[7]Yu T，Guo Z，Jin X，et al.Region normalization for image inpainting［C］//Proceedings of the AAAI Conference on Artificial Intelligence.Chongqing，2020:12733-12740.
[8]Shepley A，Falzon G，Kwan P.Confluence:a robust non-IoU alternative to non-maxima suppression in object detection［J］.ArXiv Preprint ArXiv.https://blog.csdn.net/chrisitian666/article/details/111759945.
[9]Liu Z，Lin Y，Cao Y，et al.Swin transformer:hierarchical vision transformer using shifted windows［C］//Proceedings of the IEEE/CVF International Conference on Computer Vision.Montreal，2021:10012-10022.
[10]Hu J，Shen L，Sun G.Squeeze-and-excitation networks［C］//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City，2018:2011-2023.
[11]刘邵凡.基于对抗学习的多域自适应目标检测方法研究［D］.合肥:合肥工业大学，2021.(Liu Shao-fan.Multiple domain adaptive target detection based on against learning method research［D］.Hefei:Hefei University of Technology，2021.)
[12]王德兴，王越，袁红春.基于Inception-Residual和生成对抗网络的水下图像增强［J］.液晶与显示，2021，36(11):1474-1485.(Wang De-xing，Wang Yue，Yuan Hong-chun.Underwater image enhancement based on inception-residual and generated antagonistic network［J］.Chinese Journal of Liquid Crystals and Displays，201，36(11):1474-1485.)
[13]万晓丹.基于对抗网络与卷积神经网络的目标检测方法［J］.计算机应用与软件，2021，38(1):192-196.(Wan Xiao-dan.Object detection method based on adversarial network and convolutional neural network［J］.Computer Applications and Software，2021，38(1):192-196.)
[14]孙强，李一全，于占江，等.Inception-ViT模型的微型铣刀磨损状态预测研究［J］.工具技术，2022，56(1):13-21.(Sun Qiang，Li Yi-quan，Yu Zhan-jiang，et al.Research on micro milling cutter wear state prediction based on Inception-ViT model［J］.Journal of Tool Technology，202，56(1):13-21.)
[15]苏晋鹏.基于超特征金字塔与对抗学习的目标检测算法研究［D］.南京:南京邮电大学，2019.(Su Jin-peng.Based on the characteristics of super pyramid and confrontation study target detection algorithm research［D］.Nanjing:Nanjing University of Posts and Telecommunications，2019.)
[16]任仲乐.基于数据驱动的遥感目标检测与地物分类［D］.西安:西安电子科技大学，2020.(Ren Zhong-le.Remote sensing target detection based on data driven and feature classification［D］.Xi’an:Xi’an University of Electronic Science and Technology，2020.)
[17]杨潇宇，汪西莉.结合多尺度注意力和边缘监督的遥感图像建筑物分割模型［J］.激光与光电子学进展，2022，59(22):335-344.(Yang Xiao-yu，Wang Xi-li.Building segmentation model of remote sensing image combining multi-scale attention and edge supervision［J］.Laser & Optoelectronics Progress，2022，59(22):335-344.)
[18]康健，王智睿，祝若鑫，等.基于监督对比学习正则化的高分辨率SAR图像建筑物提取方法［J］.雷达学报，2022，11(1):157-167.(Kang Jian，Wang Zhi-rui，Zhu Ruo-xin，et al.Building extraction method of high resolution SAR image based on supervised contrast learning regularization［J］.Chinese Journal of Radar，2022，11(1):157-167.)
[19]舒朗.基于强化学习的目标检测算法研究［D］.杭州:杭州电子科技大学，2018.(Shu Lang.Research on object detection algorithm based on reinforcement learning［D］.Hangzhou:Hangzhou Dianzi University，2018.)