东北大学学报(自然科学版) ›› 2025, Vol. 46 ›› Issue (2): 9-17.DOI: 10.12068/j.issn.1005-3026.2025.20230252

• 信息与控制 • 上一篇    下一篇

基于MASAC最大熵强化学习的跳波束卫星系统资源适配方案

王译萱(), 刘军   

  1. 东北大学 计算机科学与工程学院,辽宁 沈阳 110169
  • 收稿日期:2023-08-29 出版日期:2025-02-15 发布日期:2025-05-20
  • 通讯作者: 王译萱
  • 作者简介:王译萱(1999—),女,河南郑州人,东北大学硕士研究生
    刘 军(1969—),男,辽宁沈阳人,东北大学教授.
  • 基金资助:
    国家自然科学基金资助项目(61671141)

Resource Adaptation Scheme for Beam-Hopping Satellite System Based on MASAC Maximum Entropy Reinforcement Learning

Yi-xuan WANG(), Jun LIU   

  1. School of Computer Science & Engineering,Northeastern University,Shenyang 110169,China.
  • Received:2023-08-29 Online:2025-02-15 Published:2025-05-20
  • Contact: Yi-xuan WANG

摘要:

针对跳波束卫星系统中通信终端多样化的业务需求导致星-地资源供需失配,以及上行传输中机器类终端能量资源受限的挑战,提出一种基于MASAC(multi-agent soft actor-critic)最大熵强化学习的资源适配方案.首先构建了两阶段传输系统模型,在星-地资源供需失配问题的基础上,研究跳波束与非正交多址接入(non-orthogonal multiple access,NOMA)的协同作用.同时,引入能量采集与收集机制,优化了终端设备能量采集与信号传输之间的关系.在此基础上,将上下行传输过程进行整合,建立跳波束图样选择,时隙分配以及速率与功率控制的多目标优化问题,并采用MASAC算法进行优化求解,得到最优联合控制方案.实验结果表明,所提方案能够有效进行资源分配以实现星-地资源供需匹配,并满足能量受限终端的信号传输需求.与基准算法相比,所提算法具有良好的性能.

关键词: 跳波束卫星, 非正交多址, 能量收集, 资源适配, 深度强化学习

Abstract:

To address the mismatch between space-to-ground resources supply and demand caused by the diversified traffic requirements of communication terminals in the beam-hopping satellite system,as well as the challenge of limited energy resources of machine-type devices in upward transmission,a resource adaptation scheme is proposed based on a multi-agent soft actor-critic(MASAC)approach utilizing maximum entropy reinforcement learning. Firstly,a two-stage transmission system model is constructed to investigate the synergistic effect of beam-hopping and non-orthogonal multiple access(NOMA)on the basis of the space-to-ground resource mismatch problem. Additionally,an energy harvesting and collection mechanism is introduced to optimize the relationship between terminal device energy harvesting and signal transmission. On this basis,a multi-objective optimization problem is established for beam-hopping pattern selection,time slot allocation,and rate and power control by integrating the uplink and downlink transmission processes. MASAC maximum entropy reinforcement learning is employed for optimization,obtaining an optimal joint control strategy. Experimental results show that the proposed scheme can effectively allocate resources for space-to-ground resource matching and meet the signal transmission requirements of energy-constrained machine terminals. Compared with the benchmark algorithm,the proposed algorithm exhibits superior performance.

Key words: beam-hopping satellite, non-orthogonal multiple access(NOMA), energy harvesting, resource allocation, deep reinforcement learning

中图分类号: