Journal of Northeastern University(Natural Science) ›› 2023, Vol. 44 ›› Issue (6): 770-776.DOI: 10.12068/j.issn.1005-3026.2023.06.002

• Information & Control • Previous Articles     Next Articles

An Efficient DBG Indexing Method for Large-Scale Read Mapping

YU Chang-yong, LI Jun-jie, MA Hai-tao, ZHAO Yu-hai   

  1. School of Computer Science & Engineering, Northeastern University, Shenyang 110169, China.
  • Published:2023-06-20
  • Contact: LI Jun-jie
  • About author:-
  • Supported by:
    -

Abstract: In order to answer two questions about de Bruijn graph(DBG) in bioinformatics: ① For any k-mer, answer whether it is the vertex of DBG; ② For any vertex of DBG, answer its adjacency information(incoming edge and outgoing edge), an efficient DBG indexing method for large-scale read mapping is proposed. In this paper, the above two questions are transformed into the exact finding problem of k-mer and (k+1)-mer on non-repetitive multipath, and are solved by using FM-index. Firstly, a given reference sequence is compressed, which is the discovery of non-repetitive multipaths, thereby compressing the presence of large-scale repetitive (k+1)-mers in the sequence. Secondly, the DBG is indexed based on the non-repetitive multipath FM-index. The k-mer is searched whether it appears on the DBG, and if so, the direct predecessor and direct successor nodes of the k-mer are provided to improve the spatial-temporal efficiency. Finally, experiments are performed on the genomes of 62 E. coli strains. The experimental results show that the method proposed can efficiently index the DBG of multi-reference sequences.

Key words: de Bruijn graph; index; read mapping; FM-index; reference sequence

CLC Number: