东北大学学报:自然科学版 ›› 2016, Vol. 37 ›› Issue (8): 1095-1099.DOI: 10.12068/j.issn.1005-3026.2016.08.007

• 信息与控制 • 上一篇    下一篇

概率XML关键字检索排序算法

赵越1,2, 袁野1, 王国仁1   

  1. (1. 东北大学 计算机科学与工程学院, 辽宁 沈阳110819; 2. 沈阳大学 信息工程学院, 辽宁 沈阳110044)
  • 收稿日期:2014-08-24 修回日期:2014-08-24 出版日期:2016-08-15 发布日期:2016-08-12
  • 通讯作者: 赵越
  • 作者简介:赵越(1984-),女,辽宁沈阳人,东北大学博士研究生,沈阳大学讲师; 王国仁(1966-),男,湖北崇阳人,东北大学教授,博士生导师.
  • 基金资助:
    国家自然科学基金资助项目(6100024,61332006,U1401256); 国家重点基础研究计划项目(2011CB302200-G); 中央高校基本科研业务费专项资金资助项目(N130504006).

A Ranking Algorithm of Keyword Search on Probabilistic XML Data

ZHAO Yue1,2, YUAN Ye1, WANG Guo-ren1   

  1. 1. School of Computer Science & Engineering, Northeastern University, Shenyang 110819, China; 2. School of Information Engineering,Shenyang University, Shenyang 110044, China.
  • Received:2014-08-24 Revised:2014-08-24 Online:2016-08-15 Published:2016-08-12
  • Contact: ZHAO Yue
  • About author:-
  • Supported by:
    -

摘要: 探讨了针对概率XML文档集中与内容相关的关键字检索结果的排序问题,针对概率XML文档的特征提出了一种新的排序模式.与仅取决于检索结果概率的检索排序算法不同,本文提出的排序算法充分考虑了节点对文档的区分程度、节点描述文档的程度,以及XML文档本身的结构特性,设计了满足以上特征的检索结果排序模型,并针对排序模型提出了新的倒排索引结构.新的排序算法可以快速完成关键字检索,并将最相关的信息提供给用户.模拟数据集实验验证了该方法的有效性.

关键词: 关键字检索, 概率XML数据, SLCA, 排序

Abstract: Discusses the problem of efficiently ranking the search results of keyword related only to content on probabilistic XML data. A new ranking model is presented according to the characteristic of probabilistic XML data. Unlike the existing ranking algorithms which only depend on the probabilities of retrieval results, the new ranking algorithm proposed fully considered the degrees of nodes discriminating and describing the documents and the characteristic of probabilistic XML data. A ranking model of retrieval results which satisfied the above features is designed and a new inverted index structure for the ranking model is proposed. The new algorithm can accomplish keyword search quickly, so as to provide the most relevant information to the users. The results of simulation experiment show that the proposed method is effective.

Key words: keyword search, probabilistic XML data, SLCA(smallest lowest common ancestor), ranking

中图分类号: