Policy Iteration Algorithm for Nonzero-Sum Games with Unknown Models
YANG Ming1, LUO Yan-hong1, WANG Yi-he2
1. School of Information Science & Engineering, Northeastern University, Shenyang 110819, China; 2. Economic Technology Institute, Nation State Liaoning Province Power Co., Ltd., Shenyang 110000, China.
YANG Ming, LUO Yan-hong, WANG Yi-he. Policy Iteration Algorithm for Nonzero-Sum Games with Unknown Models[J]. Journal of Northeastern University Natural Science, 2015, 36(3): 318-322.
[1]Vamvoudakis K G, Lewis F L.Multi-player non-zero-sum games:online adaptive learning solution of coupled Hamilton-Jacobi equations[J].Automatica,2011,47(8):1556-1569. [2]张化光,张欣,罗艳红,等,自适应动态规划综述[J].自动化学报,2013,39(4):303-311.(Zhang Hua-Guang,Zhang Xin,Luo Yan-Hong ,et al.An overview of research on adaptive dynamic programming[J].ACTA Automatica Sinica,2013,39(4):303-311.) [3]刘德荣,李宏亮,王鼎.基于数据的自学习优化控制:研究进展与展望[J].自动化学报,2013,39(11):1858-1870.(Liu De-rong,Li Hong-liang,Wang Ding.Data-based self-learning optimal control:research progress and prospects[J].ACTA Automatica Sinica,2013,39(11):1858-1870.) [4]Abu-Khalaf M, Lewis F L,Jie H.Neurodynamic programming and zero-sum games for constrained control systems[J].IEEE Transactions on Neural Networks,2008,19(7):1243-1252. [5]Al-Tamimi A, Abu-Khalaf M,Lewis F L.Adaptive critic designs for discrete-time zero-sum games with application to H infinity control[J].IEEE Transactions on Systems,Man,and Cybernetics,Part B:Cybernetics,2007,37(1):240-247. [6]Zhang H,Wei Q,Liu D.An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games[J].Automatica,2011,47(1):207-214. [7]Vrabie D, Lewis F.Integral reinforcement learning for online computation of feedback Nash strategies of nonzero-sum differential games[C]// 2010 49th IEEE Conference on Decision and Control(CDC).Atlanta,2010:3066-3071. [8]Huaguang Z,Lili C,Yanhong L.Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP[J].IEEE Transactions on Cybernetics,2013,43(1):206-216. [9]Jiang Y,Jiang Z P.Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics[J].Automatica,2012,48(10):2699-2704. [10]Li H,Liu D,Wang D.Integral policy iteration for zero-sum games with completely unknown nonlinear dynamics[C]// Neural Information Processing,20th International Conference,ICONIP 2013.Berlin Heidelberg:Springer,2013:225-232. [11]Gajic Z,Li T Y.Simulation results for two new algorithms for solving coupled algebraic Riccati equations[C]//In Third International Jymposium.on Differential Games.Nice,1988.