Journal of Northeastern University Natural Science ›› 2018, Vol. 39 ›› Issue (7): 959-963.DOI: 10.12068/j.issn.1005-3026.2018.07.010

• Information & Control • Previous Articles     Next Articles

A String Collection Indexing Method with String Length and Position Constraint

YU Chang-yong1, GAO Ming1, BAI Lu-yi1, ZHAO Yu-hai2   

  1. 1. School of Computer and Communication Engineering, Northeastern University at Qinhuangdao, Qinhuangdao 066004, China; 2. School of Computer Science & Engineering, Northeastern University, Shenyang 110169, China.
  • Received:2017-05-24 Revised:2017-05-24 Online:2018-07-15 Published:2018-07-11
  • Contact: YU Chang-yong
  • About author:-
  • Supported by:
    -

Abstract: An index method of string collection was proposed based on BWT (Burrows-wheeler-transform) for solving the exact substring queries with string length and matching position constraints. Firstly, the BWT and exact string query based on it were discussed. Then the impact of string collection, string length and substring position upon the original BWT index was analyzed. Finally, the fast calculation problem was discussed and solved from the position of the matching suffix to the string ID and position on the string of the matching substring. The approximate string matching was conducted on three real string collections and compared the results of index method proposed and the original one. The experimental results showed that the method proposed based on BWT speeded up the process of exact substring queries with string length and matching position constraints considerably in the case of keeping the index size. Therefore, the proposed method was suitable for indexing large-scale string collection for string similarity match and joint.

Key words: BWT (Burrows-Wheeler-transform), string index, inverted list, string similarity match, sequence alignment

CLC Number: