东北大学学报(自然科学版) ›› 2025, Vol. 46 ›› Issue (10): 44-50.DOI: 10.12068/j.issn.1005-3026.2025.20240079

• 信息与控制 • 上一篇    下一篇

基于几何注意力机制的三维手部姿势估计算法

邹慧, 佘黎煌, 陈烨涵, 乐意   

  1. 东北大学 计算机科学与工程学院,辽宁 沈阳 110169
  • 收稿日期:2024-04-08 出版日期:2025-10-15 发布日期:2026-01-13
  • 作者简介:邹 慧(2000—),女,吉林白山人,东北大学硕士研究生
    佘黎煌(1980—),男,福建莆田人,东北大学讲师,博士.
  • 基金资助:
    辽宁省教育厅高等学校基本科研项目(LJKZ0011);辽宁省科学技术计划项目(2021JH1/10400011)

3D Gesture Estimation Algorithm Based on Geometric Attention Mechanism

Hui ZOU, Li-huang SHE, Ye-han CHEN, Yi YUE   

  1. School of Computer Science & Engineering,Northeastern University,Shenyang 110169,China. Corresponding author: SHE Li-huang,E-mail: shelihuang@ise. neu. edu. cn
  • Received:2024-04-08 Online:2025-10-15 Published:2026-01-13

摘要:

在Transformer编码-解码的基础架构上设计了手势识别网络,在自注意力机制的基础上引入了优化的偏移注意力机制来提取手部特征.同时为了更好地提取手部结构的局部特征,设计了邻域聚合策略.手部结构自身的三维复杂性导致其不同区域的平滑程度不同,进行手部姿势估计时,忽略这种特征会使手部结构的局部关键信息丢失,为了解决这一问题,对手部结构进行了几何分解,分别用锐变成分和柔变成分来表示手部结构的尖锐区域和平坦区域,通过注意力机制对这两种成分的特征给予不同的关注.在MSRA,ICVL和NYU数据集上的实验验证了此算法的准确度与SOTA算法相当.

关键词: 手势识别, 三维点云, 注意力机制, Transformer模型, 深度学习

Abstract:

A gesture recognition network based on the coding and decoding infrastructure of Transformer was designed, and an optimized offset attention mechanism was introduced to extract hand features based on the self-attention mechanism. At the same time, in order to extract the local features of the hand structure better, a neighborhood aggregation strategy was designed. The three-dimensional (3D) complexity of the hand structure itself led to different levels of smoothness in different regions. When estimating gestures, ignoring this feature usually leads to the loss of local key information of the hand structure. In order to solve this problem, geometric decomposition of the hand structure was carried out, and sharp and flexible components were used to represent the sharp and flat regions of the hand structure, respectively. Different attention was paid to the characteristics of these two components through the attention mechanism. Experiments on MSRA, ICVL, and NYU datasets demonstrate that the accuracy of this algorithm is comparable to that of SOTA.

Key words: gesture recognition, 3D point cloud, attention mechanism, Transformer model, deep learning

中图分类号: