[1]赖俊,魏竞毅,陈希亮.分层强化学习综述[J].计算机工程与应用,2021,57(3):72-79.(Lai Jun,Wei Jing-yi,Chen Xi-liang.Overview of hierarchical reinforcement learning[J].Computer Engineering and Applications,2021,57(3):72-79.) [2]Mnih V,Kavukcuoglu K,Silver D,et al.Playing atari with deep reinforcement learning[C]//Proceedings of the Workshops at the 26th Neural Information Processing Systems.Lake Tahoe:NIPS,2013:201-220. [3]Chen T,Shahar G,Tom Z,et al.A deep hierarchical approach to lifelong learning in minecraft[C]//Thirty-First AAAI Conference on Artificial Intelligence.Palo Alto:AAAI,2017:1553-1561. [4]Nachum O,Gu S X,Lee H,et al.Data-efficient hierarchical reinforcement learning[J].Advances in Neural Information Processing Systems,2018,31:3307-3317. [5]Vezhnevets A S,Osindero S,Schaul T,et al.Feudal networks for hierarchical reinforcement learning[C]//Proceedings of Machine Learning Research.San Diego:PMLR,2017:3540-3549. [6]Harb J,Bacon P L,Klissarov M,et al.When waiting is not an option:learning options with a deliberation cost[C]//Thirty-Second AAAI Conference on Artificial Intelligence.Palo Alto:AAAI,2018:3165-3172. [7]Konidaris G,Barto A.Skill discovery in continuous reinforcement learning domains using skill chaining[J].Advances in Neural Information Processing Systems,2009,22:1015-1023. [8]Baumli K,Warde-Farley D,Hansen S,et al.Relative variational intrinsic control[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Vancouver:AAAI,2021:6732-6740. [9]Florensa C,Duan Y,Abbeel P.Stochastic neural networks for hierarchical reinforcement learning[C]//Proceedings of the 5th International Conference on Learning Representations.Toulon:ICLR,2017:1422-1439. [10]Finn C,Abbeel P,Levine S.Model-agnostic meta-learning for fast adaptation of deep networks[C]// Proceedings of Machine Learning Research.Sydney:PMLR,2017:1126-1135. [11]Campos V,Trott A,Xiong C M,et al.Explore,discover and learn:unsupervised discovery of state-covering skills[C]//Proceedings of Machine Learning Research.San Diego:PMLR,2020:1317-1327. [12]Pathak D,Agrawal P,Efros A A,et al.Curiosity-driven exploration by self-supervised prediction[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.New York:IEEE,2017:488-489. [13]Bhatti A,Singh S,Garg A,et al.Modeling affect-based intrinsic rewards for exploration and learning[J].IEEE Transactions on Cognitive and Developmental Systems,2021,13:590-602. [14]Bellemare M,Srinivasan S,Ostrovski G,et al.Unifying count-based exploration and intrinsic motivation[J].Advances in Neural Information Processing Systems,2016,29:1471-1479. [15]Tang H,Houthooft R,Foote D,et al.Exploration:a study of count-based exploration for deep reinforcement learning[J].Advances in Neural Information Processing Systems,2017,30:2753-2762. [16]Mohamed S,Jimenez R D.Variational information maximisation for intrinsically motivated reinforcement learning[J].Advances in Neural Information Processing Systems,2015,28:2125-2133. [17]Galloudec Q,Dellandra E.Cell-free latent go-explore[C]//Proceedings of the Workshops at the 35th Neural Information Processing Systems.Virtual:NIPS,2021:475-487. [18]Schaul T,Quan J,Antonoglou I,et al.Prioritized experience replay[C]//Proceedings of the 4th International Conference on Learning Representations.San Juan:ICLR,2016:322-355.(上接第1547页)