Journal of Northeastern University(Natural Science) ›› 2025, Vol. 46 ›› Issue (2): 9-17.DOI: 10.12068/j.issn.1005-3026.2025.20230252

• Information & Control • Previous Articles     Next Articles

Resource Adaptation Scheme for Beam-Hopping Satellite System Based on MASAC Maximum Entropy Reinforcement Learning

Yi-xuan WANG(), Jun LIU   

  1. School of Computer Science & Engineering,Northeastern University,Shenyang 110169,China.
  • Received:2023-08-29 Online:2025-02-15 Published:2025-05-20
  • Contact: Yi-xuan WANG

Abstract:

To address the mismatch between space-to-ground resources supply and demand caused by the diversified traffic requirements of communication terminals in the beam-hopping satellite system,as well as the challenge of limited energy resources of machine-type devices in upward transmission,a resource adaptation scheme is proposed based on a multi-agent soft actor-critic(MASAC)approach utilizing maximum entropy reinforcement learning. Firstly,a two-stage transmission system model is constructed to investigate the synergistic effect of beam-hopping and non-orthogonal multiple access(NOMA)on the basis of the space-to-ground resource mismatch problem. Additionally,an energy harvesting and collection mechanism is introduced to optimize the relationship between terminal device energy harvesting and signal transmission. On this basis,a multi-objective optimization problem is established for beam-hopping pattern selection,time slot allocation,and rate and power control by integrating the uplink and downlink transmission processes. MASAC maximum entropy reinforcement learning is employed for optimization,obtaining an optimal joint control strategy. Experimental results show that the proposed scheme can effectively allocate resources for space-to-ground resource matching and meet the signal transmission requirements of energy-constrained machine terminals. Compared with the benchmark algorithm,the proposed algorithm exhibits superior performance.

Key words: beam-hopping satellite, non-orthogonal multiple access(NOMA), energy harvesting, resource allocation, deep reinforcement learning

CLC Number: