东北大学学报(自然科学版) ›› 2025, Vol. 46 ›› Issue (12): 1-8.DOI: 10.12068/j.issn.1005-3026.2025.12.20240094

• 信息与控制 •    

基于深度强化学习的风险导向人群导航策略

姜杨1, 赵天祥2, 孙若怀2,3, 王雷4,5   

  1. 1.东北大学 机器人科学与工程学院,辽宁 沈阳 110169
    2.东北大学 信息科学与工程学院,辽宁 沈阳 110819
    3.沈阳新松机器人自动化股份有限公司,辽宁 沈阳 110168
    4.中煤科工(辽宁)具身智能科技有限公司,辽宁 ;沈阳 110168
    5.中煤科工机器人科技有限公司,辽宁 沈阳 110168
  • 收稿日期:2024-04-19 出版日期:2025-12-15 发布日期:2026-02-09
  • 通讯作者: 姜杨
  • 基金资助:
    辽宁省重大科技计划项目(2023020703-JH26/101)

Risk-Oriented Crowd Navigation Strategy Based on Deep Reinforcement Learning

Yang JIANG1, Tian-xiang ZHAO2, Ruo-huai SUN2,3, Lei WANG4,5   

  1. 1.School of Robot Science & Engineering,Northeastern University,Shenyang 110169,China
    2.School of Information Science & Engineering,Northeastern University,Shenyang 110819,China
    3.SIASUN Robot & Automation Co. ,Ltd. ,Shenyang 110168,China
    4.China Coal Science and Engineering (Liaoning) Embodied Intelligent Technology Co. ,Ltd. ,Shenyang 110168,China
    5.China Coal Technology & Engineering Group Robotics Co. ,Ltd. ,Shenyang 110168,China.
  • Received:2024-04-19 Online:2025-12-15 Published:2026-02-09
  • Contact: Yang JIANG

摘要:

针对传统导航方法在遇到动态障碍物时出现的机器人冻结问题及动态避障效果不佳的问题,提出一种基于深度强化学习的导航方法.该方法的核心为风险感知模块和路径选择模块.风险感知模块实时计算机器人与附近动态障碍物的碰撞概率,从而使机器人优先规避高风险的障碍物.同时,路径选择模块实时计算机器人附近区域的“可通过性”,引导机器人选择较安全区域通过.与未引入这两个模块的深度强化学习方法相比,所提出的方法在所有仿真测试环境中均取得最高的导航成功率,最高提升达11%.

关键词: 深度强化学习, 人群导航, 动态避障, 机器人冻结, 风险感知, 路径选择

Abstract:

To improve robot freezing and suboptimal performance of traditional navigation methods in the presence of dynamic obstacles, a navigation method based on deep reinforcement learning was proposed. The core of this method lies in its risk perception module and path selection module. The risk perception module calculated the collision probability between the robot and nearby dynamic obstacles in real time, allowing the robot to prioritize avoiding more hazardous obstacles. Concurrently, the path selection module evaluated the “passing ability” of the robot in surrounding areas in real time, guiding the robot to choose safer paths. In comparison experiments with a deep reinforcement learning method that lacks these modules, the proposed method achieved the highest navigation success rate in all simulation test environments, with an improvement rate of up to 11%.

Key words: deep reinforcement learning, crowd navigation, dynamic obstacle avoidance, robot freezing, risk perception, path selection

中图分类号: