东北大学学报:自然科学版 ›› 2019, Vol. 40 ›› Issue (1): 11-15.DOI: 10.12068/j.issn.1005-3026.2019.01.003

• 信息与控制 • 上一篇    下一篇

基于随机森林的热轧带钢质量分析与预测方法

纪英俊1, 勇晓玥1, 刘英林2, 刘士新1   

  1. (1. 东北大学 信息科学与工程学院, 辽宁 沈阳110819; 2. 上海宝信软件股份有限公司 大数据事业部, 上海201203)
  • 收稿日期:2018-04-19 修回日期:2018-04-19 出版日期:2019-01-15 发布日期:2019-01-28
  • 通讯作者: 纪英俊
  • 作者简介:纪英俊(1989-),男,辽宁沈阳人,东北大学博士研究生; 刘士新(1968-),男,辽宁调兵山人,东北大学教授,博士生导师.
  • 基金资助:
    国家重点研发计划项目(2017YFB0306401); 国家自然科学基金资助项目(61573089).

Random Forest Based Quality Analysis and Prediction Method for Hot-Rolled Strip

JI Ying-jun1, YONG Xiao-yue1, LIU Ying-lin2, LIU Shi-xin1   

  1. 1. School of Information Science & Engineering, Northeastern University, Shenyang 110819, China; 2. Big Data Department, Shanghai Baosight Software Co., Ltd., Shanghai 201203, China.
  • Received:2018-04-19 Revised:2018-04-19 Online:2019-01-15 Published:2019-01-28
  • Contact: LIU Shi-xin
  • About author:-
  • Supported by:
    -

摘要: 以某钢铁企业的热轧带钢生产实际数据作为分析对象,基于改进的随机森林算法分析工艺参数与产品质量间的隐含关系,进行影响产品质量关键工艺参数的特征提取,建立热轧带钢产品缺陷预测模型.实验结果表明,对非平衡数据集进行平衡处理可以提高样本预测精度;采用CART与C4.5相结合的方法比单一方法可以进一步提升预测精度;同时根据特征的高相关与低相关特性,将互信息作为评价指标应用于特征选择,可以提升随机森林算法的分类效果.在以上三种改进策略下,热轧带钢缺陷的识别率得到明显提高.

关键词: 热轧带钢, 缺陷预测, 数据驱动, 特征提取, 随机森林

Abstract: The process data of hot-rolled strips from an iron and steel enterprise were analyzed to find out the inherent relationship between process parameters and production quality by using an improved random forests algorithm. After critical features being extracted, a defect prediction model was built. According to the experiment, balancing operation can improve the prediction accuracy of the imbalanced data sets. Meanwhile, the combination of CART and C4.5 can further improve the prediction accuracy than each single method. Furthermore, in consideration of the characteristics whose features have high or low correlations with the response variable, mutual information was introduced as an evaluation criterion for feature selection. Mutual information makes great contribution to classification effect of random forest algorithm, and recognition rate of defects of hot-rolled strips is obviously improved by using three strategies.

Key words: hot-rolled strip, defect prediction, data driven, feature selection, random forests

中图分类号: