Policy Iteration Algorithm for Nonzero-Sum Games with Unknown Models

doi:10.12068/j.issn.1005-3026.2015.03.004

Journal of Northeastern University Natural Science ›› 2015, Vol. 36 ›› Issue (3): 318-322.DOI: 10.12068/j.issn.1005-3026.2015.03.004

• Information & Control • Previous Articles Next Articles

Policy Iteration Algorithm for Nonzero-Sum Games with Unknown Models

YANG Ming¹， LUO Yan-hong¹， WANG Yi-he²

1. School of Information Science & Engineering， Northeastern University， Shenyang 110819， China; 2. Economic Technology Institute， Nation State Liaoning Province Power Co.， Ltd.， Shenyang 110000， China.

Received:2014-01-08 Revised:2014-01-08 Online:2015-03-15 Published:2014-11-07
Contact: YANG Ming
About author:-
Supported by:
-

Abstract

Abstract: An online integral policy iteration algorithm was proposed to find the solution of two-player nonzero-sum differential games with completely unknown nonlinear continuous-time dynamics. Exploration signals can be added into the control and disturbance policies， rather than having to find the model information. An approximate dynamic programming (ADP) of model-free approach can be constructed， and the nonzero-sum games can be solved. The value function， control and disturbance policies simultaneously can be updated by the proposed algorithm， and converged policy weight parameters are obtained. To implement the algorithm， four neural networks are used respectively to approximate the two game value functions， the control policy and the disturbance policy. The least squares method is used to estimate the unknown parameters of the neural networks. The effectiveness of the developed scheme is demonstrated by a simulation example.

Key words: adaptive dynamic programming, nonzero-sum games, policy iteration, neural networks, optimal control

CLC Number:

TP183

YANG Ming， LUO Yan-hong， WANG Yi-he. Policy Iteration Algorithm for Nonzero-Sum Games with Unknown Models[J]. Journal of Northeastern University Natural Science, 2015, 36(3): 318-322.

References

[1]Vamvoudakis K G， Lewis F L.Multi-player non-zero-sum games:online adaptive learning solution of coupled Hamilton-Jacobi equations［J］.Automatica，2011，47(8):1556-1569.
[2]张化光，张欣，罗艳红，等，自适应动态规划综述［J］.自动化学报，2013，39(4):303-311.(Zhang Hua-Guang，Zhang Xin，Luo Yan-Hong ，et al.An overview of research on adaptive dynamic programming［J］.ACTA Automatica Sinica，2013，39(4):303-311.)
[3]刘德荣，李宏亮，王鼎.基于数据的自学习优化控制:研究进展与展望［J］.自动化学报，2013，39(11):1858-1870.(Liu De-rong，Li Hong-liang，Wang Ding.Data-based self-learning optimal control:research progress and prospects［J］.ACTA Automatica Sinica，2013，39(11):1858-1870.)
[4]Abu-Khalaf M， Lewis F L，Jie H.Neurodynamic programming and zero-sum games for constrained control systems［J］.IEEE Transactions on Neural Networks，2008，19(7):1243-1252.
[5]Al-Tamimi A， Abu-Khalaf M，Lewis F L.Adaptive critic designs for discrete-time zero-sum games with application to H infinity control［J］.IEEE Transactions on Systems，Man，and Cybernetics，Part B:Cybernetics，2007，37(1):240-247.
[6]Zhang H，Wei Q，Liu D.An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games［J］.Automatica，2011，47(1):207-214.
[7]Vrabie D， Lewis F.Integral reinforcement learning for online computation of feedback Nash strategies of nonzero-sum differential games［C］// 2010 49^th IEEE Conference on Decision and Control(CDC).Atlanta，2010:3066-3071.
[8]Huaguang Z，Lili C，Yanhong L.Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP［J］.IEEE Transactions on Cybernetics，2013，43(1):206-216.
[9]Jiang Y，Jiang Z P.Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics［J］.Automatica，2012，48(10):2699-2704.
[10]Li H，Liu D，Wang D.Integral policy iteration for zero-sum games with completely unknown nonlinear dynamics［C］// Neural Information Processing，20^th International Conference，ICONIP 2013.Berlin Heidelberg:Springer，2013:225-232.
[11]Gajic Z，Li T Y.Simulation results for two new algorithms for solving coupled algebraic Riccati equations［C］//In Third International Jymposium.on Differential Games.Nice，1988.

Policy Iteration Algorithm for Nonzero-Sum Games with Unknown Models

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 10

Recommended Articles

Metrics

Comments

[1]	XU Jiu-qiang， ZHANG Jin-peng， JIA Yu-qi， SHAO Jian-xin. Ensemble Learning Based Recognition Method for Bundle Branch Block [J]. Journal of Northeastern University Natural Science, 2020, 41(3): 321-326.
[2]	YAN Shi-jie， ZHAO Xiao-li， GAO Wen-zhong， HAN Yi-ming. Modeling and Optimal Control for Microgrid Using Matrix Perturbation Theory [J]. Journal of Northeastern University Natural Science, 2018, 39(9): 1217-1221.
[3]	MENG Fan-wei， XU Bo， LYU Xiao-yong ， LIU Yin-qi. Application of Neural Network Predictive Control in SCR Flue Gas Denitration System [J]. Journal of Northeastern University:Natural Science, 2017, 38(6): 778-782.
[4]	WANG Zhan-shan， KANG Yun-yun， NIU Hai-sha. A Class of Neural Networks for Solving Optimization Problems with Global Attractivity [J]. Journal of Northeastern University Natural Science, 2017, 38(2): 153-157.
[5]	JIANG Wei， WU Gong-ping， CAO Qi， YANG Song. RBF Neural Network Control of Live Operation Robot Manipulator for High Voltage Transmission Line [J]. Journal of Northeastern University Natural Science, 2017, 38(10): 1388-1394.
[6]	JIANG Wei， WU Gong-ping， FAN Fei， ZHANG Jie. Four Arm Mobile Working Robot Linkage Control for High Voltage Transmission Line Based on BP Neural Network [J]. Journal of Northeastern University Natural Science, 2016, 37(11): 1530-1535.
[7]	WANG Tao， LUO Yanhong. Optimal Control for Nonlinear DiscreteTime Time Delay Systems with Saturating Actuators [J]. Journal of Northeastern University Natural Science, 2014, 35(4): 461-464.
[8]	ZHAO Shi-tie， GAO Xian-wen， CHE Chang-jie. Control of a Nonlinear Magnetic Levitation System Based RBF Neural Network [J]. Journal of Northeastern University Natural Science, 2014, 35(12): 1673-1677.
[9]	PAN Feng， LIU Lu， XUE Dingyu. Optimal FractionalOrder PIλDμ Network Delay Controller [J]. Journal of Northeastern University Natural Science, 2014, 35(10): 1382-1385.
[10]	YANG Dongmei， XU Wenming， CUI Lei. H∞ Optimal Control by Output Feedback for Singular Systems [J]. Journal of Northeastern University, 2013, 34(4): 461-464.