基于强化学习的三维游戏控制算法

doi:10.12068/j.issn.1005-3026.2021.04.004

东北大学学报（自然科学版） ›› 2021, Vol. 42 ›› Issue (4): 478-483.DOI: 10.12068/j.issn.1005-3026.2021.04.004

基于强化学习的三维游戏控制算法

孟琭，沈凝，祁殷俏，张昊园

(东北大学信息科学与工程学院，辽宁沈阳110819)

修回日期:2020-05-04 接受日期:2020-05-04 发布日期:2021-04-15
通讯作者: 孟琭
作者简介:孟琭(1982-)，男，辽宁沈阳人，东北大学副教授.
基金资助:
国家重点研发计划项目(2018YFB2003502)；国家自然科学基金资助项目(62073061)；中央高校基本科研业务费专项资金资助项目(N2004020).

Control Algorithm of Three-Dimensional Game Based on Reinforcement Learning

MENG Lu， SHEN Ning， QI Yin-qiao， ZHANG Hao-yuan

School of Information Science & Engineering， Northeastern University， Shenyang 110819， China.

Revised:2020-05-04 Accepted:2020-05-04 Published:2021-04-15
Contact: MENG Lu
About author:-
Supported by:
-

摘要/Abstract

摘要： 基于强化学习，设计了一个面向三维第一人称射击游戏(DOOM)的智能体，该智能体可在游戏环境下移动、射击敌人、收集物品等.本文算法结合深度学习的目标识别算法Faster RCNN与Deep Q-Networks(DQN)算法，可将DQN算法的搜索空间大大减小，从而极大提升本文算法的训练效率.在虚拟游戏平台(ViZDoom)的两个场景下(Defend_the_center和Health_gathering)进行实验，将本文算法与最新的三维射击游戏智能体算法进行比较，结果表明本文算法可以用更少的迭代次数实现更优的训练结果.

关键词: 强化学习;深度学习;目标识别;Faster RCNN;DQN

Abstract: Based on reinforcement learning， an agent for three-dimensional first person shooting game(DOOM)was designed. The agent can move， shoot enemies and collect objects in the game environment. The proposed algorithm combines the Faster RCNN algorithm of deep learning and the Deep Q-Networks(DQN)algorithm of reinforcement learning， which can greatly reduce the search space of DQN algorithm and improve the training efficiency of the proposed algorithm. The experiments were carried out in two scenes(Defend_the_center and Health_gathering)of the virtual game platform(ViZDoom)， and the proposed algorithm was compared with the state-of-the-art three-dimensional shooting game agent algorithm. The results show that the proposed algorithm can achieve better training results with fewer iterations.

Key words: reinforcement learning; deep learning; object detection; Faster RCNN; Deep Q-Networks(DQN)

中图分类号:

TP391.41

孟琭，沈凝，祁殷俏，张昊园. 基于强化学习的三维游戏控制算法[J]. 东北大学学报（自然科学版）, 2021, 42(4): 478-483.

MENG Lu， SHEN Ning， QI Yin-qiao， ZHANG Hao-yuan. Control Algorithm of Three-Dimensional Game Based on Reinforcement Learning[J]. Journal of Northeastern University(Natural Science), 2021, 42(4): 478-483.

参考文献

[1]Mnih V，Kavukcuoglu K，Silver D，et al.Human-level control through deep reinforcement learning［J］.Nature，2015，518(7540):529-533.
[2]Kempka M，Wydmuch M，Runc G，et al.ViZDoom:a doom-based AI research platform for visual reinforcement learning［C］// IEEE Conference on Computational Intelligence and Games(CIG).Santorini，2016:1-8.
[3]Beattie C，Leibo J Z，Teplyashin D，et al.DeepMind lab［EB/OL］.(2016-12-13)［2019-12-20］.https://arxiv.org/abs/1612.03801
[4]Johnson M，Hofmann K，Hutton T J，et al.The Malmo platform for artificial intelligence experimentation［C］ // International Joint Conference on Artificial Intelligence(IJCAI).New York，AAAI Press，2016:4246-4247.
[5]Wydmuch M，Kempka M，Jas＇kowski W.ViZDoom competitions:playing doom from pixels［J］.IEEE Transactions on Games，2019，11(3):248-259.
[6]Hausknecht M，Stone P.Deep recurrent q-learning for partially observable MDPs［J］.arXiv:1507.06527［2020-03-08］.https://www.researchgate.net/publication/280329735 Deep Recurrent Q Learning for Partially Observable_MDPs.
[7]Lample G，Chaplot D S.Playing FPS games with deep reinforcement learning［J］ arXiv:1609.05521［2020-01-13］.https://zhuanlan.zhihu.com/p/34401116.
[8]McPartland M，Gallagher M.Learning to be a Bot:reinforcement learning in shooter games［C］ // Artificial Intelligence and Interactive Digital Entertainment Conference(AIIDE).Palo Alto，California，2008:78-83.
[9]Tastan B，Sukthankar G R.Learning policies for first person shooter games using inverse reinforcement learning［C］ // Artificial Intelligence and Interactive Digital Entertainment Conference(AIIDE).Palo Alto，California，2011:85-90.
[10]陈亮，王志茹，韩仲，等.基于可见光遥感图像的船只目标检测识别方法［J］.科技导报，2017，35(20):77-85.(Chen Liang，Wang Zhi-ru，Han Zhong，et al. A review of ship detection and recognition based on optical remote sensing image［J］.Science & Technology Review，2017，35(20):77-85.)
[11]Han C，Gao G，Zhang Y，et al.Real-time small traffic sign detection with revised faster-RCNN［J］.Multimedia Tools and Applications，2019，78(10):13263-13278.
[12]Zhu X，Zhang C，Xie W，et al.Server monitoring system using an improved Faster RCNN approach［C］ // IEEE International Conference on Anti-counterfeiting，Security，and Identification(ASID).Xiamen，2017:50-53.
[13]郭梦浩，徐红伟.基于Faster RCNN的红外热图像热斑缺陷检测研究［J］.计算机系统应用，2019，28(11):265-270.(Guo Meng-hao，Xu Hong-wei.Hot spot defect detection based on infrared thermal image and Faster RCNN［J］.Computer Systems & Applications，2019，28(11):265-270.)

[1]	杨丹，刘国如，任梦成，裴宏杨. 多尺度卷积核U-Net模型的视网膜血管分割方法[J]. 东北大学学报（自然科学版）, 2021, 42(1): 7-14.
[2]	原培新，陈鼎夫. 双能X射线高动态范围安检图像压缩算法[J]. 东北大学学报（自然科学版）, 2021, 42(1): 96-101.
[3]	冯宝，李昌林，李智，刘壮盛. 基于活动轮廓模型和影像组学的乳腺癌LVI状态预测[J]. 东北大学学报:自然科学版, 2020, 41(2): 193-199.
[4]	张云洲，郑瑞，暴吉宁，朱尚栋. 基于多个相关滤波器的行人跟踪尺度算法[J]. 东北大学学报:自然科学版, 2019, 40(9): 1228-1234.
[5]	陈红，于晓升，吴成东，孙鹏. 参数化水平集活动轮廓模型的快速图像分割算法[J]. 东北大学学报:自然科学版, 2019, 40(1): 6-10.
[6]	徐礼胜，张书琪，牛潇，徐阳. 基于全卷积网络的左心室射血分数自动检测[J]. 东北大学学报:自然科学版, 2018, 39(11): 1572-1576.
[7]	齐林，吕旭阳，杨本强，徐礼胜. 基于全卷积网络迁移学习的左心室内膜分割[J]. 东北大学学报:自然科学版, 2018, 39(11): 1577-1582.
[8]	魏国辉，齐守良，钱唯，张魁星. 基于相似性度量的肺结节图像检索算法[J]. 东北大学学报:自然科学版, 2018, 39(9): 1226-1231.
[9]	宫照煊，覃文军，郭薇，赵大哲. 改进的局部扩展拟合图像分割方法[J]. 东北大学学报:自然科学版, 2018, 39(4): 483-487.
[10]	王彬，赵海，朱宏博，朴春赫. 基于CT图像3D特征的肺结节检测[J]. 东北大学学报:自然科学版, 2018, 39(2): 181-185.
[11]	刘晓志，齐迪迪，贲驰. 基于畸变分离的摄像机标定方法[J]. 东北大学学报:自然科学版, 2017, 38(5): 620-624.
[12]	王琪，张铁，张晓梦，张祥德. 基于SIFT和SDM的虹膜定位方法[J]. 东北大学学报:自然科学版, 2017, 38(2): 180-184.
[13]	张祥德，朱和贵，李倩颖，唐青松. 基于MBC和POEM特征的人脸识别方法[J]. 东北大学学报:自然科学版, 2015, 36(11): 1526-1529.
[14]	彭怡书，颜云辉，赵久梁，张尧. 基于非局部稀疏特征的行人检测方法[J]. 东北大学学报:自然科学版, 2015, 36(4): 465-468.
[15]	杜小甫，王成恩. 基于等值线数据的一种新的云图算法[J]. 东北大学学报(自然科学版), 2013, 34(5): 624-627.

基于强化学习的三维游戏控制算法

Control Algorithm of Three-Dimensional Game Based on Reinforcement Learning

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价