Agent Path Planning Algorithm Based on Improved SNN-HRL

doi:10.12068/j.issn.1005-3026.2023.11.005

Abstract

Abstract: Aiming at the difficult exploration problems of traditional Skill discovery algorithms such as SNN-HRL(stochastic neural networks for hierarchical reinforcement learning)， this paper proposes a hierarchical reinforcement learning algorithm that integrates multiple exploration strategies(MES) based on SNN-HRL algorithm. The proposed algorithm improves the traditional hierarchical structure， including three layers: exploration trajectory layer， learning trajectory layer， and path planning layer. In the exploration trajectory layer， the trained agent can explore as many unknown environments as possible to provide sufficient environmental state information for the subsequent training process. In the learning trajectory layer， the training results of the exploration trajectory layer are used as “priori knowledge” for the training to improve the training efficiency. In the path planning layer， skill that agent has learned are used to complete the path planning task. By comparing the performance of the MES-HRL and SNN-HRL algorithms in different environments， the simulation results show that MES-HRL algorithm solves the exploration problem of the traditional version of the algorithm and has better path planning capabilities.

Key words: deep reinforcement learning; hierarchical reinforcement learning(HRL); path planning; exploration strategy; Skill discovery method

CLC Number:

TP181

ZHAO Zhao， YUAN Pei-xin， TANG Jun-wen， CHEN Jin-lin. Agent Path Planning Algorithm Based on Improved SNN-HRL[J]. Journal of Northeastern University(Natural Science), 2023, 44(11): 1548-1555.

References

[1]赖俊，魏竞毅，陈希亮.分层强化学习综述［J］.计算机工程与应用，2021，57(3):72-79.(Lai Jun，Wei Jing-yi，Chen Xi-liang.Overview of hierarchical reinforcement learning［J］.Computer Engineering and Applications，2021，57(3):72-79.)
[2]Mnih V，Kavukcuoglu K，Silver D，et al.Playing atari with deep reinforcement learning［C］//Proceedings of the Workshops at the 26th Neural Information Processing Systems.Lake Tahoe:NIPS，2013:201-220.
[3]Chen T，Shahar G，Tom Z，et al.A deep hierarchical approach to lifelong learning in minecraft［C］//Thirty-First AAAI Conference on Artificial Intelligence.Palo Alto:AAAI，2017:1553-1561.
[4]Nachum O，Gu S X，Lee H，et al.Data-efficient hierarchical reinforcement learning［J］.Advances in Neural Information Processing Systems，2018，31:3307-3317.
[5]Vezhnevets A S，Osindero S，Schaul T，et al.Feudal networks for hierarchical reinforcement learning［C］//Proceedings of Machine Learning Research.San Diego:PMLR，2017:3540-3549.
[6]Harb J，Bacon P L，Klissarov M，et al.When waiting is not an option:learning options with a deliberation cost［C］//Thirty-Second AAAI Conference on Artificial Intelligence.Palo Alto:AAAI，2018:3165-3172.
[7]Konidaris G，Barto A.Skill discovery in continuous reinforcement learning domains using skill chaining［J］.Advances in Neural Information Processing Systems，2009，22:1015-1023.
[8]Baumli K，Warde-Farley D，Hansen S，et al.Relative variational intrinsic control［C］//Proceedings of the AAAI Conference on Artificial Intelligence.Vancouver:AAAI，2021:6732-6740.
[9]Florensa C，Duan Y，Abbeel P.Stochastic neural networks for hierarchical reinforcement learning［C］//Proceedings of the 5th International Conference on Learning Representations.Toulon:ICLR，2017:1422-1439.
[10]Finn C，Abbeel P，Levine S.Model-agnostic meta-learning for fast adaptation of deep networks［C］// Proceedings of Machine Learning Research.Sydney:PMLR，2017:1126-1135.
[11]Campos V，Trott A，Xiong C M，et al.Explore，discover and learn:unsupervised discovery of state-covering skills［C］//Proceedings of Machine Learning Research.San Diego:PMLR，2020:1317-1327.
[12]Pathak D，Agrawal P，Efros A A，et al.Curiosity-driven exploration by self-supervised prediction［C］//IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.New York:IEEE，2017:488-489.
[13]Bhatti A，Singh S，Garg A，et al.Modeling affect-based intrinsic rewards for exploration and learning［J］.IEEE Transactions on Cognitive and Developmental Systems，2021，13:590-602.
[14]Bellemare M，Srinivasan S，Ostrovski G，et al.Unifying count-based exploration and intrinsic motivation［J］.Advances in Neural Information Processing Systems，2016，29:1471-1479.
[15]Tang H，Houthooft R，Foote D，et al.Exploration:a study of count-based exploration for deep reinforcement learning［J］.Advances in Neural Information Processing Systems，2017，30:2753-2762.
[16]Mohamed S，Jimenez R D.Variational information maximisation for intrinsically motivated reinforcement learning［J］.Advances in Neural Information Processing Systems，2015，28:2125-2133.
[17]Galloudec Q，Dellandra E.Cell-free latent go-explore［C］//Proceedings of the Workshops at the 35th Neural Information Processing Systems.Virtual:NIPS，2021:475-487.
[18]Schaul T，Quan J，Antonoglou I，et al.Prioritized experience replay［C］//Proceedings of the 4th International Conference on Learning Representations.San Juan:ICLR，2016:322-355.(上接第1547页)