东北大学学报(自然科学版) ›› 2005, Vol. 26 ›› Issue (8): 733-735.DOI: -

• 论著 • 上一篇    下一篇

基于领域知识的文本分类

朱靖波;陈文亮   

  1. 东北大学信息科学与工程学院;东北大学信息科学与工程学院 辽宁 沈阳 110004
  • 收稿日期:2013-06-24 修回日期:2013-06-24 出版日期:2005-08-15 发布日期:2013-06-24
  • 通讯作者: Zhu, J.-B.
  • 作者简介:-
  • 基金资助:
    国家自然科学基金资助项目(60203019)

Approach based on domain knowledge to text categorization

Zhu, Jing-Bo (1); Chen, Wen-Liang (1)   

  1. (1) School of Information Science and Engineering, Northeastern University, Shenyang 110004, China
  • Received:2013-06-24 Revised:2013-06-24 Online:2005-08-15 Published:2013-06-24
  • Contact: Zhu, J.-B.
  • About author:-
  • Supported by:
    -

摘要: 提出了一种基于知识的文本分类方法,其中引入领域知识,利用领域特征作为文本特征,增强文本表示能力,将文本分类过程看作集聚计算过程.文本索引过程采用了改进型特征选取和权重计算方法.提出了一种基于互信息的学习算法,从训练语料中自动学习领域特征集聚计算公式.实验结果显示基于领域知识的文本分类技术总体性能优于传统的贝叶斯分类模型,领域知识的应用能够有效改善对相似主题和相反主题的分类性能.

关键词: 领域知识, 文本分类, 集聚计算, 机器学习, 朴素贝叶斯模型

Abstract: A knowledge-based text categorization method is proposed, taking domain features as textual features to improve text representation function and considering text categorization as aggregation computation procedure. A feature re-selection and re-weighting technique is proposed for text indexing procedure. To learn feature aggregation functions from labeled training collection automatically, a learning method based on mutual information is employed. Comparative experiment results showed that the text categorization method based on domain knowledge works better than the conventional naive Bayes classifier based on bag-of-words model as a whole and that using domain knowledge will improve effectiveness of classifying similar or antithetical topics.

中图分类号: