东北大学学报(自然科学版) ›› 2008, Vol. 29 ›› Issue (1): 53-56.DOI: -

• 论著 • 上一篇    下一篇

基于PLSA方法的用户兴趣聚类

陈冬玲;王大玲;于戈;于芳;   

  1. 东北大学信息科学与工程学院;东北大学信息科学与工程学院;东北大学信息科学与工程学院;东北大学信息科学与工程学院 辽宁沈阳110004;辽宁沈阳110004;辽宁沈阳110004;辽宁沈阳110004
  • 收稿日期:2013-06-22 修回日期:2013-06-22 出版日期:2008-01-15 发布日期:2013-06-22
  • 通讯作者: Chen, D.-L.
  • 作者简介:-
  • 基金资助:
    国家自然科学基金资助项目(60573090;60673139)

User interests clustering based on PLSA

Chen, Dong-Ling (1); Wang, Da-Ling (1); Yu, Ge (1); Yu, Fang (1)   

  1. (1) School of Information Science and Engineering, Northeastern University, Shenyang 110004, China
  • Received:2013-06-22 Revised:2013-06-22 Online:2008-01-15 Published:2013-06-22
  • Contact: Chen, D.-L.
  • About author:-
  • Supported by:
    -

摘要: 为了在个性化搜索过程中能够准确地挖掘到用户的潜在兴趣并进行相应的聚类分析,提出采用潜语义空间的Zipf分布的特性,并结合PLSA(概率潜在语义分析)来获取全文的语义.即先通过Zipf分布原理找到文档的潜在语义空间,在此空间中对用户的兴趣进行聚类,并建立用户兴趣描述文件(user profile),即建立用户兴趣层次树.实验表明,所提出聚类算法的聚类效果明显优于传统的VSM(向量空间模型)的聚类效果,同时,在著名的CTI数据集上的个性化推荐实验结果也充分说明基于潜在语义空间构建的用户兴趣描述与用户真实兴趣相符合.

关键词: 用户兴趣描述文件, PLSA, 潜语义空间, Zipf分布, 用户兴趣层次树

Abstract: To mine user's latent interests and make relevantly the clustering analysis during personalized search, it is proptxsed to combine the characteristics of Zipf distribution in latent semantic space with PLSA (the probability latent semantic analysis), so as to gain the semantemes of the whole text. Namely, the principle of Zipf distribution is introduced to find out the latent semantic space of files, where the user interest is clustered according to underlying factors and a user interest hierarchy tree is built in user profile. Experimental results show that the clustering result as proposed is clearly superior to that by the conventional VSM (vector space model) algorithm. In addition, the results of the recommended personalized experiment based on well-known CTI data set also indicates fully that the description of user profile on the basis of latent semantic space coincides actually with the user interest.

中图分类号: