东北大学学报:自然科学版 ›› 2019, Vol. 40 ›› Issue (7): 932-936.DOI: 10.12068/j.issn.1005-3026.2019.07.004

• 信息与控制 • 上一篇    下一篇

面向不平衡数据集的一种改进的k-近邻分类器

刘鹏1,2, 杜佳芝3, 吕伟刚2,4, 窦明武1   

  1. (1. 中国海洋大学 计算中心, 山东 青岛266100; 2. 中国海洋大学 信息学院, 山东 青岛266100; 3. 哈尔滨工业大学 计算机科学与技术学院, 黑龙江 哈尔滨150001; 4. 中国海洋大学 教育技术系, 山东 青岛266100)
  • 收稿日期:2018-07-13 修回日期:2018-07-13 出版日期:2019-07-15 发布日期:2019-07-16
  • 通讯作者: 刘鹏
  • 作者简介:刘鹏(1981-),男,山东青岛人,中国海洋大学讲师,博士研究生.
  • 基金资助:
    山东省自然科学基金资助项目(ZR2017MF051); 教育部人文社科基金资助项目(18YJCZH103).

A Modified KNN Classifier for Unbalanced Dataset

LIU Peng1,2, DU Jia-zhi3, LYU Wei-gang2,4, DOU Ming-wu1   

  1. 1. Computing Center, Ocean University of China, Qingdao 266100, China; 2. School of Information, Ocean University of China, Qingdao 266100, China; 3. School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China; 4. Department of Educational Technology, Ocean University of China, Qingdao 266100, China.
  • Received:2018-07-13 Revised:2018-07-13 Online:2019-07-15 Published:2019-07-16
  • Contact: DOU Ming-wu
  • About author:-
  • Supported by:
    -

摘要: 心脏心律失常数据集的心电图(ECG)数据往往存在各心律失常类型下样本数量不平衡问题.针对此问题,提出了一种新的模式识别分类方法,即改进的基于核的差重建的加权k-近邻分类器(modified kernel difference-weighted k-nearest neighbor classifier, MKDF-WKNN),通过引入修正因子对含样本数较多的类别进行权值抑制,对含样本数较少的类别进行权值的加大,并使用UCI心脏心律失常数据集对ECG数据进行分类.实验结果表明,提出的算法和其他一些基于KNN的算法如KNN,DS-WKNN,DF-WKNN和KDF-WKNN相比,对于不平衡的心律失常数据集的分类有更好的效果.

关键词: 心律失常, 心电图, 模式分类, k-近邻算法, 不平衡数据集

Abstract: The existing arrhythmia datasets are suffering from the unbalanced number of training sample for electrocardiogram(ECG) data due to the obvious difference among the sample number of different types. A novel KNN-based classification algorithm, i.e., a modified kernel difference-weighted KNN classifier(MKDF-WKNN) was proposed, by introducing a correction factor to restrain the weights of the categories with more samples and increase the weights of the categories with fewer samples. The experiment was carried on the UCI arrhythmia dataset to classify the ECG data. The results show that, for unbalanced datasets the proposed algorithm is better than some other KNN-based algorithms such as KNN, DS-WKNN, DF-WKNN and KDF-WKNN, in terms of classification accuracy.

Key words: cardiac arrhythmias, electrocardiogram, pattern classification, KNN algorithm, unbalanced dataset

中图分类号: