A String Collection Indexing Method with String Length and Position Constraint
YU Chang-yong1, GAO Ming1, BAI Lu-yi1, ZHAO Yu-hai2
1. School of Computer and Communication Engineering, Northeastern University at Qinhuangdao, Qinhuangdao 066004, China; 2. School of Computer Science & Engineering, Northeastern University, Shenyang 110169, China.
YU Chang-yong, GAO Ming, BAI Lu-yi, ZHAO Yu-hai. A String Collection Indexing Method with String Length and Position Constraint[J]. Journal of Northeastern University Natural Science, 2018, 39(7): 959-963.
[1]Chaudhuri S,Ganti V,Kaushik R.A primitive operator for similarity joins in data cleaning[C]// International Conference on Data Engineering.Atlanta,2006:5-17. [2]Xiao C,Wang W,Lin X,et al.Efficient similarity joins for near duplicate detection[C]// International Conference on World Wide Web.Beijing,2008:131-140. [3]Chaudhuri S,Ganjam K,Ganti V,et al.Robust and efficient fuzzy match for online data cleaning[C]//ACM SIGMOD International Conference on Management of Data.San Diego,2003:313-324. [4]Deng D,Li G,Feng J.A pivotal prefix based filtering algorithm for string similarity search[C]//ACM SIGMOD International Conference on Management of Data.San Diego,2014:673-684. [5]Li C,Wang B,Yang X.VGRAM:improving performance of approximate queries on string collections using variable-length grams[C]// International Conference on Very Large Data Bases.Vienna,2007:303-314. [6]Li H,Durbin R.Fast and accurate short read alignment with Burrows-Wheeler transform [J].Bioinformatics,2009,25(14):1754-1760. [7]Qin J,Wang W,Lu Y,et al.Efficient exact edit similarity query processing with the asymmetric signature scheme [C]// ACM SIGMOD International Conference on Management of Data.Athens,2011:1033-1044. [8]Sarawagi S,Kirpal A.Efficient set joins on similarity predicates [C]// ACM SIGMOD International Conference on Management of Data.Paris,2004:743-754. [9]Sokol D,Benson G,Tojeira J.Tandem repeats over the edit distance [J].Bioinformatics,2007,23(2):e30-e35. [10]Wang W,Xiao C,Lin X,et al.Efficient approximate entity extraction with edit distance constraints [C]// ACM SIGMOD International Conference on Management of Data.Rhode,2009:759-770. [11]Wang W,Qin J,Xiao C,et al.VChunkJoin:an efficient algorithm for edit similarity joins [J].Transactions on Knowledge & Data Engineering,2013,25(8):1916-1929. [12]Xiao C,Wang W,Lin X.Ed-Join:an efficient algorithm for similarity joins with edit distance constraints [J].Proceedings of the VLDB Endowment,2008,1(1):933-944. [13]Yang X,Wang Y,Wang B,et al.Local filtering:improving the performance of approximate queries on string collections[C]// ACM SIGMOD International Conference on Management of Data.Victoria,2015:377-392. [14]Ferragina P,Manzini G.Opportunistic data structures with applications[C]// Foundations of Computer Science.Beijing,2002:390-399.