东北大学学报(自然科学版) ›› 2022, Vol. 43 ›› Issue (7): 921-929.DOI: 10.12068/j.issn.1005-3026.2022.07.002

• 信息与控制 • 上一篇    下一篇

基于熵权法的过滤式特征选择算法

李占山, 杨云凯, 张家晨   

  1. (吉林大学 软件学院, 吉林 长春130012)
  • 发布日期:2022-08-02
  • 通讯作者: 李占山
  • 作者简介:李占山(1966-),男,吉林公主岭人,吉林大学教授,博士生导师; 张家晨(1969-),男,吉林德惠人,吉林大学教授.
  • 基金资助:
    吉林省自然科学基金资助项目(20180101043JC); 吉林省发展和改革委员会基金资助项目(2019C053-9).

Filtering Feature Selection Algorithm Based on Entropy Weight Method

LI Zhan-shan, YANG Yun-kai,ZHANG Jia-chen   

  1. College of Software, Jilin University, Changchun 130012,China.
  • Published:2022-08-02
  • Contact: ZHANG Jia-chen
  • About author:-
  • Supported by:
    -

摘要: 互信息过滤式特征选择算法往往仅局限于互信息这一度量标准.为规避采取单一的互信息标准的局限性,在互信息的基础上引入基于距离度量的算法RReliefF,从而得出更好的过滤式准则.将RReliefF用于分类任务,度量特征与标签的相关性;应用最大互信息系数(maximal information coefficient,MIC)度量特征与特征之间的冗余性、特征与标签的相关性;最后,应用熵权法为MIC和RReliefF进行客观赋权,提出了基于熵权法的过滤式特征选择算法(filtering feature selection algorithm based on entropy weight method, FFSBEWM).在13个数据集上进行对比实验,结果表明,FFSBEWM所选择的特征子集的平均分类准确率和最高分类准确率均优于其他对比算法.

关键词: 特征选择;熵权法;互信息;过滤式准则;信息理论

Abstract: Mutual information-based filtering feature selection algorithms are often limited to the metric of mutual information. In order to circumvent the limitations of adopting only mutual information, a distance metric-based algorithm RReliefF is introduced on the basis of mutual information to obtain better filtering criteria. RReliefF is used for the classification tasks to measure the relevance between features and labels. In addition, maximal information coefficient(MIC) is used to measure the redundancy between features and the relevance between features and labels. Finally, entropy weight method is applied to objectively weigh the MIC and RReliefF. On this basis, a filtering feature selection algorithm based on entropy weight method(FFSBEWM) is proposed. Comparing experiments carried out on 13 data sets show that the average classification accuracy and highest classification accuracy of the feature subsets selected by the proposed algorithm are higher than those of the comparison algorithms.

Key words: feature selection; entropy weight method; mutual information; filtering criteria; information theory

中图分类号: