东北大学学报:自然科学版 ›› 2020, Vol. 41 ›› Issue (11): 1550-1556.DOI: 10.12068/j.issn.1005-3026.2020.11.005

• 信息与控制 • 上一篇    下一篇

基于新冗余度的特征选择方法

李占山, 吕艾娜   

  1. (吉林大学 计算机科学与技术学院, 吉林 长春130012)
  • 收稿日期:2019-12-16 修回日期:2019-12-16 出版日期:2020-11-15 发布日期:2020-11-16
  • 通讯作者: 李占山
  • 作者简介:李占山(1966-),男,吉林公主岭人,吉林大学教授,博士生导师.
  • 基金资助:
    吉林省自然科学基金资助项目(20180101043JC); 吉林省发展和改革委员会产业技术研究与开发项目(2019C053-9).

A Feature Selection Method Based on New Redundancy Measurement

LI Zhan-shan, LYU Ai-na   

  1. School of Computer Science and Technology, Jilin University, Changchun 130012, China.
  • Received:2019-12-16 Revised:2019-12-16 Online:2020-11-15 Published:2020-11-16
  • Contact: LYU Ai-na
  • About author:-
  • Supported by:
    -

摘要: 现有过滤式特征选择模型采用贪心策略结合互信息评价特征子集,容易陷入局部最优陷阱.考虑标签信息对冗余度的影响,利用一种改进的MIFS-U方法在给定标签的条件下衡量冗余度,采用基于分解的多目标优化框架结合引入多项式突变的差分进化算子进行全局搜索,避免搜索陷入局部最优.引入l1正则化项来保证特征子集的稀疏性,并提出了新的特征选择算法MOEA/D-DEFS.实验阶段使用knn-5分类器来验证学习效果,并在多组来自不同领域的数据集上进行测试.结果表明,将特征选择视为多目标问题采用全局搜索策略搜索可以在特征子集维度和分类准确性方面提供更好的性能.

关键词: 特征选择, 互信息, 多目标进化算法, l1正则化项, 冗余度

Abstract: The current filter feature selection models use greedy strategy combined with mutual information to evaluate feature subsets, which are easy to fall into the local optimum trap. Considering the effect of label information on redundancy, an improved MIFS-U method is used to measure the redundancy under the condition of a given label. A decomposition-based multi-objective optimization framework combined with a differential evolution operator that introduces polynomial mutation is used for global search to avoid searching into local optimum. The l1 regularization is introduced to ensure the sparsity of the feature subset, and a new feature selection algorithm MOEA/D-DEFS is proposed. In the experimental stage, the knn-5 classifier is used to verify the learning effect, by the tests on multiple sets of data sets from different fields. The results show that considering feature selection as a multi-objective problem and using a global search strategy can provide better performance in terms of feature subset dimensions and classification accuracy.

Key words: feature selection, mutual information, multi-objective evolution algorithm, l1-regularization, redundancy

中图分类号: