Application of Reinforcement Learning Based on Hybrid Model in Optimal Control of Flotation Process

doi:10.12068/j.issn.1005-3026.2024.10.003

Abstract

Abstract:

Traditional optimization control methods are difficult to make accurate and rapid decisions when the state of the flotation process changes， resulting in significant fluctuations in the concentrate grade and tailings grade， and unstable product quality. In addition， the flotation process is difficult to detect the concentrate grade online， leading to a decrease in its practicality. In response to the above problems， a hybrid model is used to model the flotation process and a reinforcement learning algorithm based on safety augmented value estimation from demonstrations （SAVED） is used to control the size distribution of flotation overflow bubbles to indirectly control the concentrate grade and tailings grade. The effectiveness of the proposed algorithm is verified through simulation experiments. Compared with artifical experience and data-driven models， SAVED based on hybrid models is used to model the flotation process and control the size distribution of flotation overflow bubbles. The algorithms can achieve better control effects while ensuring safety constraints.

Key words: flotation process, reinforcement learning, hybrid model, safety constraints, optimal control

CLC Number:

TP 273

Run-da JIA, Dong-hao ZHANG, Jun ZHENG, Kang LI. Application of Reinforcement Learning Based on Hybrid Model in Optimal Control of Flotation Process[J]. Journal of Northeastern University(Natural Science), 2024, 45(10): 1386-1393.

Figures/Tables 16

Fig.1 Internal structure diagram of flotation cell

Fig.2 Hybrid model structure diagram of flotation

Fig.3 Schematic diagram of the materiel transfer

Fig.4 PEM network structure diagram

Fig.5 Swish activation function

Fig.6 SAVED algorithm flowchart

Table 1 Algorithm of SAVED

基于混合模型的SAVED算法

1.初始化：基于浮选半实物仿真平台初始化数据 $D$ ；将半实物仿真平台模型参数输入浮选现象学机理模型 $f c$ ；基于数据 $D$ 中的状态以及动作输入机理模型生成误差数据 $D 1$ ；

2.当 $k < K$ ，循环执行;

3. 基于误差数据 $D 1$ 训练 $P E$ M；

4. 当 $t < P$ （P为任务范围）循环执行;

5. 当采样动作 $a t$ 小于超参数人为设定值 $N$ 时，循环执行;

6. 在动作范围内随机生成序列；

7. 基于回报与约束评价动作序列；

8. 选取动作序列更新交叉熵法分布；

9. 当 $t$ 大于超参数人为设定值z执行;

10. 将动作加入到动作序列评价；

11. 执行t到t+T之间的动作序列 $a t : t + T *$ 中的首个动作 $a t *$ ；

12. 记录结果： $D 1$ ← $D 1 ? s t, a t *, s t + 1$ .

Table 1 Algorithm of SAVED

基于混合模型的SAVED算法

2.当 $k < K$ ，循环执行;

3. 基于误差数据 $D 1$ 训练 $P E$ M；

4. 当 $t < P$ （P为任务范围）循环执行;

5. 当采样动作 $a t$ 小于超参数人为设定值 $N$ 时，循环执行;

6. 在动作范围内随机生成序列；

7. 基于回报与约束评价动作序列；

8. 选取动作序列更新交叉熵法分布；

9. 当 $t$ 大于超参数人为设定值z执行;

10. 将动作加入到动作序列评价；

11. 执行t到t+T之间的动作序列 $a t : t + T *$ 中的首个动作 $a t *$ ；

12. 记录结果： $D 1$ ← $D 1 ? s t, a t *, s t + 1$ .

Table 2 CEM iterative update algorithm

CEM迭代更新算法

1. 初始化参数：根据浮选过程先验知识初始化参数集合；

2. 生成样本：使用参数集合，随机抽样生成样本；

3. 评估样本：对于生成的样本，使用式（17）的目标函数评价x_n到x_t+n 时刻的样本性能；

4. 选择优秀样本：根据评估的结果，在当前样本中选择一定比例优秀样本；

5. 拟合新参数：通过选出的优秀样本，对参数集合进行更新；

6. 迭代优化：重复上述步骤2~5，直到满足停止条件.

Fig.7 Schematic diagram of SAVED single step planning

Fig.8 Bubble viewer

Table 3 Parameter list of PEM

参数	参数值
PEM模型数	5
模型层数	5
神经元数量	512
批次数量	64
迭代次数	200
β参数	0.8

Fig.9 PEM training loss

Fig.10 Comparison of actual values and model prediction values

Table 4 Comparison of prediction accuracy

误差类型	机理模型	PEM模型	混合模型
RMSE	0.18	0.34	0.002 0
MAE	0.10	0.28	0.001 5

Fig.11 Control diagram illustrating the overflow bubble size set at 0.09 cm

Fig.12 Control diagram for liquid level height with an overflow bubble size set at 0.09 cm

References 18

1	范继涛，朱勃霖.矿产资源节约与综合利用先进技术推广的思考［J］.中国矿业，2013，22（11）：23-26.
	Fan Ji‑tao， Zhu Bo‑lin.Thinking on advanced technologies promotion of mineral resources saving and comprehensive utilization［J］.China Mining Magazine，2013，22（11）：23-26.
2	Xue W Q， Fan J L， Jiang Y.Flotation process with model free adaptive control［C］//IEEE International Conference on Information and Automation （ICIA）.Macau：IEEE，2017：442-447.
3	Osthuizen D J， le Roux J D， Craig I K.A dynamic flotation model to infer process characteristics from online measurements［J］.Minerals Engineering，2021，167：106878.
4	Rojas D， Cipriano A.Model based predictive control of a rougher flotation circuit considering grade estimation in intermediate cells［J］.Dyna，2011，78（166）：29-37.
5	李金娜，高溪泽，柴天佑，等.数据驱动的工业过程运行优化控制［J］.控制理论与应用，2016，33（12）：1584-1592.
	Li Jin‑na， Gao Xi‑ze， Chai Tian‑you，et al.Data‑driven operational optimization control of industrial processes［J］.Control Theory & Applications，2016，33（12）：1584-1592.
6	Jiang Y， Fan J L， Chai T Y，et al.Data‑driven flotation industrial process operational optimal control based on reinforcement learning［J］.IEEE Transactions on Industrial Informatics，2018，14（5）：1974-1989.
7	Nakhaei F， Irannajad M， Mohammadnejad S.A comprehensive review of froth surface monitoring as an aid for grade and recovery prediction of flotation process.Part B：texture and dynamic features［J］. Energy Sources，Part A：Recovery，Utilization，and Environmental Effects，2023，45（3）：7812-7834.
8	Quintanilla P， Neethling S J， Brito‑Parada P R.Modelling for froth flotation control：a review［J］.Minerals Engineering，2021，162：106718
9	Sun B， Yang W， He M F.An integrated multi‑mode model of froth flotation cell based on fusion of flotation kinetics and froth image features［J］.Minerals Engineering，2021，172：107169.
10	Quintanilla P， Neethling S J， Navia D.A dynamic flotation model for predictive control incorporating froth physics.part I：model development［J］.Minerals Engineering，2021，173：107192.
11	Quintanilla P， Neethling S J， Mesa D，et al.A dynamic flotation model for predictive control incorporating froth physics.part II：model calibration and validation［J］.Minerals Engineering，2021，173：107190.
12	Abbeel P， Quigley M， Ng A Y.Using inaccurate models in reinforcement learning［C］//Proceedings of the 23rd International Conference on Machine Learning.Pittsburgh： 2006：1-8.
13	Baranes A， Oudeyer P Y.Active learning of inverse models with intrinsically motivated goal exploration in robots［J］.Robotics and Autonomous Systems，2013，61（1）：49-73.
14	Xu H， Wang K， Li X L.Multi‑objective optimization control of flotation process based on policy iteration［C］//2023 IEEE 12th Data Driven Control and Learning Systems Conference （DDCLS）.Xiangtan：IEEE，2023：417-422.
15	Quintanilla P， Navia D， Neethling S J，et al.Economic model predictive control for a rougher froth flotation cell using physics‑based models［J］.Minerals Engineering，2023，196：108050.
16	Thananjeyan B， Balakrishna A， Rosolia U，et al.Safety augmented value estimation from demonstrations （saved）：safe deep model‑based rl for sparse cost robotic tasks［J］.IEEE Robotics and Automation Letters，2020，5（2）：3612-3619.
17	Bailey M， Gomez C O， Finch J A.A method of bubble diameter assignment［J］.Minerals Engineering，2005，18（1）：119-123.
18	Panjipour R， Karamoozian M， Albijanic B.Bubble size distributions in gas‑liquid‑solid systems and their influence on flotation separation in a bubble column［J］.Chemical Engineering Research and Design，2021，167：96-106.

[1]	Yan LIU, Qi-jie BU, Hong-chen ZHAO, Xin GUO. Operating Performance Assessment of Flotation Process Based on Multi-source Heterogeneous Information [J]. Journal of Northeastern University(Natural Science), 2024, 45(9): 1217-1226.
[2]	ZHAO Zhao， YUAN Pei-xin， TANG Jun-wen， CHEN Jin-lin. Agent Path Planning Algorithm Based on Improved SNN-HRL [J]. Journal of Northeastern University(Natural Science), 2023, 44(11): 1548-1555.
[3]	ZHANG Xue-feng， WANG Zhao-yi. Automatic Lane Change Decision Model Based on Dueling Double Deep Q-network [J]. Journal of Northeastern University(Natural Science), 2023, 44(10): 1369-1376.
[4]	LIU Jun， DAI Fu-cheng， XIN Ning. Virtual Machine Placement Strategy Based on Multi-objective Optimization [J]. Journal of Northeastern University(Natural Science), 2022, 43(5): 609-617.
[5]	MENG Lu， SHEN Ning， QI Yin-qiao， ZHANG Hao-yuan. Control Algorithm of Three-Dimensional Game Based on Reinforcement Learning [J]. Journal of Northeastern University(Natural Science), 2021, 42(4): 478-483.
[6]	LI Xiong-fei， ZHOU Jin-nan， ZHANG Xiao-li. Research on Advertising Conversion Rate Based on Hybrid Model [J]. Journal of Northeastern University Natural Science, 2019, 40(7): 942-947.
[7]	YAN Shi-jie， ZHAO Xiao-li， GAO Wen-zhong， HAN Yi-ming. Modeling and Optimal Control for Microgrid Using Matrix Perturbation Theory [J]. Journal of Northeastern University Natural Science, 2018, 39(9): 1217-1221.
[8]	YANG Ming， LUO Yan-hong， WANG Yi-he. Policy Iteration Algorithm for Nonzero-Sum Games with Unknown Models [J]. Journal of Northeastern University Natural Science, 2015, 36(3): 318-322.
[9]	WANG Tao， LUO Yanhong. Optimal Control for Nonlinear DiscreteTime Time Delay Systems with Saturating Actuators [J]. Journal of Northeastern University Natural Science, 2014, 35(4): 461-464.
[10]	PAN Feng， LIU Lu， XUE Dingyu. Optimal FractionalOrder PIλDμ Network Delay Controller [J]. Journal of Northeastern University Natural Science, 2014, 35(10): 1382-1385.
[11]	YU Liang， MAO Zhizhong， JIA Runda. Hybrid Model Based on ICALSSVM for Copper Extraction [J]. Journal of Northeastern University Natural Science, 2014, 35(10): 1369-1372.
[12]	YANG Dongmei， XU Wenming， CUI Lei. H∞ Optimal Control by Output Feedback for Singular Systems [J]. Journal of Northeastern University, 2013, 34(4): 461-464.
[13]	YUAN Qingyun， WANG Fuli， HE Dakuo. Hybrid Modeling Method of Forecasting Gold Production in the Hydrometallurgy Process [J]. Journal of Northeastern University, 2013, 34(3): 308-311.
[14]	LIU Tan， GAO Xianwen， WANG Lina. Hybrid Modeling Method of Comprehensive Energy Consumption for Oil and Gas Production Process [J]. Journal of Northeastern University(Natural Science), 2013, 34(11): 1525-1528.