东北大学学报(自然科学版) ›› 2021, Vol. 42 ›› Issue (12): 1688-1695.DOI: 10.12068/j.issn.1005-3026.2021.12.003

• 信息与控制 • 上一篇    下一篇

基于LightGBM的特征选择算法

李占山, 姚鑫, 刘兆赓, 张家晨   

  1. (吉林大学 计算机科学与技术学院, 吉林 长春130012)
  • 修回日期:2021-04-09 接受日期:2021-04-09 发布日期:2021-12-17
  • 通讯作者: 李占山
  • 作者简介:李占山(1966-),男,吉林公主岭人,吉林大学教授,博士生导师; 张家晨(1969-),男,吉林德惠人,吉林大学教授.
  • 基金资助:
    国家自然科学基金资助项目(61802056); 吉林省自然科学基金资助项目(20180101043JC); 吉林省发展和改革委员会产业技术研究与开发项目(2019C053-9).

Feature Selection Algorithm Based on LightGBM

LI Zhan-shan, YAO Xin, LIU Zhao-geng, ZHANG Jia-chen   

  1. College of Computer Science and Technology, Jilin University, Changchun 130012, China.
  • Revised:2021-04-09 Accepted:2021-04-09 Published:2021-12-17
  • Contact: ZHANG Jia-chen
  • About author:-
  • Supported by:
    -

摘要: 为解决过滤式和基于演化学习的包裹式两类特征选择算法的缺陷,提出一种新型包裹式特征选择算法LGBFS(LightGBM feature selection).首先引入LightGBM对原始特征构建迭代提升树模型并对特征重要度进行度量;随后结合提出的LR序列前向搜索策略LRSFFS对特征进行选择;最后将所提出算法与9种对比算法在21个标准数据集上进行对比,结果显示LGBFS在21个标准数据集中的16个取得最优分类精度、18个取得最优维度缩减率和最优CPU运行时间.还进行了时间复杂度分析与显著性检验,检验表明LGBFS相较6种对比算法具有显著性差异,也说明LGBFS能够同时兼顾特征子集的计算效率和分类精度.

关键词: 特征选择;LightGBM;迭代提升树;包裹式;序列搜索

Abstract: In order to solve the shortcomings of the following two types of feature selection algorithms, filtering and wrapping based on evolutionary learning, a new wrapping feature selection algorithm LGBFS(LightGBM feature selection) was proposed. First, LightGBM was introduced to construct an gradient boosting tree model for the original features and measure the importance of features; then the proposed LR sequential forward search strategy LRSFFS was combined to select features; finally, the proposed algorithm was compared with nine algorithms in 21 standard datasets. The results show that 16 of the 21 standard data sets of LGBFS have achieved the best classification accuracy, and 18 standard data sets have achieved the best dimensionality reduction rate and the best CPU running time. In addition, time complexity analysis and significance test were carried out. The test shows that LGBFS is significantly different from the six comparison algorithms, and it also shows that LGBFS can balance the calculation efficiency and classification accuracy of feature subsets.

Key words: feature selection; LightGBM; boosting tree; wrapped method; sequential search

中图分类号: