Journal of Northeastern University Natural Science ›› 2015, Vol. 36 ›› Issue (3): 318-322.DOI: 10.12068/j.issn.1005-3026.2015.03.004

• Information & Control • Previous Articles     Next Articles

Policy Iteration Algorithm for Nonzero-Sum Games with Unknown Models

YANG Ming1, LUO Yan-hong1, WANG Yi-he2   

  1. 1. School of Information Science & Engineering, Northeastern University, Shenyang 110819, China; 2. Economic Technology Institute, Nation State Liaoning Province Power Co., Ltd., Shenyang 110000, China.
  • Received:2014-01-08 Revised:2014-01-08 Online:2015-03-15 Published:2014-11-07
  • Contact: YANG Ming
  • About author:-
  • Supported by:
    -

Abstract: An online integral policy iteration algorithm was proposed to find the solution of two-player nonzero-sum differential games with completely unknown nonlinear continuous-time dynamics. Exploration signals can be added into the control and disturbance policies, rather than having to find the model information. An approximate dynamic programming (ADP) of model-free approach can be constructed, and the nonzero-sum games can be solved. The value function, control and disturbance policies simultaneously can be updated by the proposed algorithm, and converged policy weight parameters are obtained. To implement the algorithm, four neural networks are used respectively to approximate the two game value functions, the control policy and the disturbance policy. The least squares method is used to estimate the unknown parameters of the neural networks. The effectiveness of the developed scheme is demonstrated by a simulation example.

Key words: adaptive dynamic programming, nonzero-sum games, policy iteration, neural networks, optimal control

CLC Number: