东北大学学报:自然科学版 ›› 2016, Vol. 37 ›› Issue (5): 624-628.DOI: 10.12068/j.issn.1005-3026.2016.05.004

• 信息与控制 • 上一篇    下一篇

采用BWT的多核并行的子串匹配算法

王佳英, 王斌, 李晓华, 杨晓春   

  1. (东北大学 计算机科学与工程学院, 辽宁 沈阳110819)
  • 收稿日期:2015-03-20 修回日期:2015-03-20 出版日期:2016-05-15 发布日期:2016-05-13
  • 通讯作者: 王佳英
  • 作者简介:王佳英(1985-),男,辽宁本溪人,东北大学博士研究生; 杨晓春(1973-),女,辽宁沈阳人,东北大学教授,博士生导师.
  • 基金资助:
    国家自然科学基金资助项目(61322208,61272178,61129002,61572122,61532021); 教育部高等学校博士学科点专项科研基金资助项目(20110042110028).

Multi-core Parallel Substring Matching Algorithm Using BWT

WANG Jia-ying, WANG Bin, LI Xiao-hua, YANG Xiao-chun   

  1. School of Computer Science & Engineering, Northeastern University, Shenyang 110819, China.
  • Received:2015-03-20 Revised:2015-03-20 Online:2016-05-15 Published:2016-05-13
  • Contact: WANG Bin
  • About author:-
  • Supported by:
    -

摘要: 针对P-BWT精确匹配算法存在只支持短串查询并且只能工作在单处理器上的问题,提出了一个多核并行的支持任意查询长度的精确查询算法.改进了P-BWT索引上的查询过程,当一个查询串跨越了多个数据分片时,首先在其匹配的最后一个分片上查询,然后依次在前面分片上进行验证.进一步提出了一个多核并行查询算法来减少搜索和验证过程的迭代次数.实验结果表明,所述算法可以高效并行地完成子串匹配任务.

关键词: BWT, 全文索引, 精确匹配, 并行, 多核

Abstract: In order to solve the problem that P-BWT (Burrows-Wheeler transform) could only support short queries, and work on a uniprocessor, a multi-core parallel exact matching algorithm was proposed which any query length could be supposed. Firstly, the search process on P-BWT index was modified. When a query spans multiple data fragments, it first searches on the last segment, then verifies on the other segments. Further, a parallel algorithm was proposed to reduce the iterations in the search and verify process. Finally, the experimental study show that using the proposed algorithm, the substring matching task could be accomplished efficiently in parallel manner.

Key words: BWT(Burrows-Wheeler transform), full text index, exact matching, parallel, multi-core

中图分类号: