东北大学学报(自然科学版) ›› 2010, Vol. 31 ›› Issue (6): 782-785.DOI: -

• 论著 • 上一篇    下一篇

基于评论修正的博客聚类算法

郭朋伟;高克宁;张斌;   

  1. 东北大学信息科学与工程学院;
  • 收稿日期:2013-06-20 修回日期:2013-06-20 出版日期:2010-06-15 发布日期:2013-06-20
  • 通讯作者: -
  • 作者简介:-
  • 基金资助:
    国家自然科学基金资助项目(60773218);;

Public blog clustering algorithm based on revision by comments

Guo, Peng-Wei (1); Gao, Ke-Ning (1); Zhang, Bin (1)   

  1. (1) School of Information Science and Engineering, Northeastern University, Shenyang 110004, China
  • Received:2013-06-20 Revised:2013-06-20 Online:2010-06-15 Published:2013-06-20
  • Contact: Zhang, B.
  • About author:-
  • Supported by:
    -

摘要: 博客聚类是处理博客信息的有效方法,提出基于评论修正的博客页面聚类算法.首先分析博客所包含的信息层次结构,然后利用博客页面的通用属性构建博客属性模型,基于博客属性模型对博客页面进行聚类,并且在初次聚类的基础上利用博文的评论对聚类结果进行修正.采用通用的熵和纯净度来衡量聚类结果,根据评论利用方式的不同,设计了两种实验方案:一个实验直接使用评论参与聚类,另一个将评论作为聚类后的修正手段.实验结果对比表明,在大多数情况下,利用评论作为修正手段的聚类效果要优于直接利用评论参与聚类.

关键词: 博客, 聚类, 博客评论, 修正, 聚类算法

Abstract: Public blog clustering is an effective way to process blog information. A public blog clustering algorithm was therefore proposed, based on the revision by comments. Analyzing the information hierarchy of public blog, a public blog attribute model based on the general attributes of blog pages was developed as a basis on which the public blog was clustered. Then, after the initial clustering, the comments on the clustered public blog were taken in to revise the clustered blog. The clustered results were evaluated with entropy and purity, and two testing schemes were designed according to different ways of taking the comments in. One was making the comments on public blog participate in clustering process directly, the other was making use of the comments after clustering to play the role of revision. Testing results showed that, in most cases, the latter was more effective than the former.

中图分类号: