东北大学学报(自然科学版) ›› 2007, Vol. 28 ›› Issue (2): 184-188.DOI: -

• 论著 • 上一篇    下一篇

DNA序列中基于后继数组索引的SATR查找算法

王镝;赵毅;陈白尘;王国仁;   

  1. 东北大学信息科学与工程学院;东北大学信息科学与工程学院;东北大学信息科学与工程学院;东北大学信息科学与工程学院 辽宁沈阳110004;辽宁沈阳110004;辽宁沈阳110004;辽宁沈阳110004
  • 收稿日期:2013-06-24 修回日期:2013-06-24 出版日期:2007-02-15 发布日期:2013-06-24
  • 通讯作者: Wang, D.
  • 作者简介:-
  • 基金资助:
    国家自然科学基金资助项目(6027307960473074)

SUA-based algorithm for finding SATRs in DNA sequence

Wang, Di (1); Zhao, Yi (1); Chen, Bai-Chen (1); Wang, Guo-Ren (1)   

  1. (1) School of Information Science and Engineering, Northeastern University, Shenyang 110004, China
  • Received:2013-06-24 Revised:2013-06-24 Online:2007-02-15 Published:2013-06-24
  • Contact: Wang, D.
  • About author:-
  • Supported by:
    -

摘要: 研究了基因序列分析中的DNA序列相似性重复片段的查找问题.在对重复片段的相似性衡量进行分析之后,基于海明距离提出了新的相似度衡量标准模式相似度和片段相似度,并在此基础上提出了一个新的相似性重复片段的定义SATR(segment-similarity based approximate tandem repeats).在进行SATR的查找时,采用了一个轻量级的索引后继数组,并设计出在后继数组上进行SATR查找的算法.实验评估和性能分析表明,基于后继数组的SATR查找算法在查找结果和查找时间上都要优于其他同类方法.

关键词: DNA序列, 相似性重复片段, 片段相似度, SATR, 后继数组

Abstract: Studies finding approximate repetitions in DNA sequence, which is an important problem in gene analysis. Analyzing the approximate repetitions and similarity measurements and based on Hamming Distance, two definitions of pattern-similarity and segment-similarity are proposed as new measurements of similarity, then on the basis of the two definitions, a new concept of approximate repetition, i.e., the segment-similarity based approximate tandem repeats (SATR) is given. In addition, the succeeding unit array (SUA) as a lightweight index is introduced in finding SATRs in DNA sequence with an algorithm designed to find SATRs based on the index. Theoretical analysis and experiment results both show that the SATR finding algorithm based on SUA is superior to other methods in finding results and time saving.

中图分类号: