基于双决斗深度Q网络的自动换道决策模型

doi:10.12068/j.issn.1005-3026.2023.10.001

摘要/Abstract

摘要： 汽车自动变道需要在保证不发生碰撞的情况下，以尽可能快的速度行驶，规则性地控制不仅对意外情况不具有鲁棒性，而且不能对间隔车道的情况做出反应.针对这些问题，提出了一种基于双决斗深度Q网络(dueling double deep Q-network， D3QN)强化学习模型的自动换道决策模型，该算法对车联网反馈的环境车信息处理之后，通过策略得到动作，执行动作后根据奖励函数对神经网络进行训练，最后通过训练的网络以及强化学习来实现自动换道策略.利用Python搭建的三车道环境以及车辆仿真软件CarMaker进行仿真实验，得到了很好的控制效果，结果验证了本文算法的可行性和有效性.

关键词: 车道变换；自动驾驶；强化学习；深度学习；深度强化学习

Abstract: Automatic lane change of vehicles requires driving at the fastest possible speed while ensuring no collision situations. However， regular control is not robust enough to handle unexpected situations or respond to lane separation. To solve these problems， an automatic lane change decision model based on dueling double deep Q-network(D3QN) reinforcement learning model is proposed. The algorithm processes the environmental vehicle information fed back by the internet of vehicles， and then obtains actions through strategies. After the actions are executed， the neural network is trained according to given reward function， and finally the automatic lane change strategy is realized through the trained network and reinforcement learning. The three-lane environment built by Python and the vehicle simulation software CarMaker are used to carry out simulation experiments. The results show that the algorithm proposed has a good control effect， making it feasible and effective.

Key words: lane change; driverless vehicles; reinforcement learning; deep learning; deep reinforcement leaning

中图分类号:

TP391

张雪峰，王照乙. 基于双决斗深度Q网络的自动换道决策模型[J]. 东北大学学报（自然科学版）, 2023, 44(10): 1369-1376.

ZHANG Xue-feng， WANG Zhao-yi. Automatic Lane Change Decision Model Based on Dueling Double Deep Q-network[J]. Journal of Northeastern University(Natural Science), 2023, 44(10): 1369-1376.

参考文献

[1]Gipps P G.A model for the structure of lane-changing decisions［J］.Transportation Research Part B:Methodological，1986，20(5):403-414.
[2]杨达，吕蒙，戴力源，等.车联网环境下自动驾驶车辆车道选择决策模型［J］.中国公路学报，2022，35(4):243-255.(Yang Da，Lyu Meng，Dai Li-yuan，et al.Decision-making model for lane selection of automated vehicles in connected vehicle environment［J］.China Journal of Highway and Transport，2022，35(4):243-255.)
[3]Hou Y，Edara P，Sun C.Situation assessment and decision making for lane change assistanceusing ensemble learning methods ［J］.Expert Systems with Application，2015，42(8):3875-3882.
[4]Liu Y G，Wang X，Li L，et al.A novel lane change decision-making model of autonomous vehicle based on support vector machine ［J］.IEEE Access，2019，7:26543-26550.
[5]Mnih V，Kavukcuoglu K，Silver D，et al.Human-level control through deep reinforcement learning［J］.Nature，2015，518(7540):529-533.
[6]王建平，王刚，毛晓彬，等.基于深度强化学习的二连杆机械臂运动控制方法［J］.计算机应用，2021，41(6):1799-1804.(Wang Jian-ping，Wang Gang，Mao Xiao-bin，et al.Motion control method of two-link manipulator based on deep reinforcement learning［J］. Journal of Computer Applications，2021，41(6):1799-1804.)
[7]赖建辉.基于D3QN的交通信号控制策略［J］.计算机科学，2019，46(sup2):117-121.(Lai Jian-hui.Traffic signal control based on double deep Q-learning network with dueling architecture［J］.Computer Science，2019，46(sup2):117-121.)
[8]舒凌洲，吴佳，王晨.基于深度强化学习的城市交通信号控制算法［J］.计算机应用，2019，39(5):1495-1499.(Shu Ling-zhou，Wu Jia，Wang Chen.Urban traffic signal control based on deep reinforcement learning［J］. Journal of Computer Applications，2019，39(5):1495-1499.)
[9]孟琭，沈凝，祁殷俏，等.基于强化学习的三维游戏控制算法［J］.东北大学学报(自然科学版)，2021，42(4):478-483.(Meng Lu，Shen Ning，Qi Yin-qiao，et al.Control algorithm of three-dimensional game based on reinforcement learning［J］.Journal of Northeastern University(Natural Science)，2021，42(4):478-483.)
[10]You C，Lu J，Filev D，et al.Highway traffic modeling and decision making for autonomous vehicle using reinforcement learning［C］// 2018 IEEE Intelligent Vehicles Symposium(IV).Paris:IEEE，2018:1227-1232.
[11]Zhang Y，Sun P，Yin Y H，et al.Human-like autonomous vehicle speed control by deep reinforcement learning with double Q-learning［C］// 2018 IEEE Intelligent Vehicles Symposium(IV).Paris:IEEE，2018:1251-1256.
[12]Hoel C J，Wolff K，Laine L.Automated speed and lane change decision making using deep reinforcement learning［C］//2018 21st International Conference on Intelligent Transportation Systems(ITSC).Maui:IEEE，2018:2148-2155.
[13]Wang P，Li H，Chan C Y.Continuous control for automated lane change behavior based on deep deterministic policy gradient algorithm［C］// 2019 IEEE Intelligent Vehicles Symposium(IV).Paris:IEEE，2019:1454-1460.
[14]Sutton R，Barto A.Reinforcement learning:an introduction ［M］.Cambridge:MIT Press，1998.
[15]Van Hasselt H，Guez A，Silver D.Deep reinforcement learning with double Q-learning［C］//Proceedings of the AAAI Conference on Artificial Intelligence.Palo Alto:AAAI，2016:2094-2110.
[16]Wang Z Y，Schaul T，Hessel M，et al.Dueling network architectures for deep reinforcement learning［C］//International Conference on Machine Learning.New York:PMLR，2016:1995-2003.
[17]李诗成.考虑乘员个性化舒适性的自动驾驶汽车轨迹规划算法研究［D］.广州:华南理工大学，2021.(Li Shi-cheng.Research on autonomous vehicle trajectory planning algorithm considering the personalized comfort of occupants［D］.Guangzhou:South China University of Technology，2021.)
[18]Treiber M，Hennecke A，Helbing D.Congested traffic states in empirical observations and microscopic simulations［J］.Physical Review E，2000，62:1805-1824.
[19]Schaul T，Quan J，Antonoglou I，et al.Prioritized experience replay［J］.arXiv Preprint arXiv，1511.05952，2015.