Journal of Northeastern University ›› 2011, Vol. 32 ›› Issue (5): 634-637.DOI: -

• OriginalPaper • Previous Articles     Next Articles

Top-k similarity joins on probabilistic data based on earth mover's distance

Xu, Jia (1); Yu, Ge (1); Gu, Yu (1); Bai, Qiu-Shi (1)   

  1. (1) School of Information Science and Engineering, Northeastern University, Shenyang 110819, China
  • Received:2013-06-19 Revised:2013-06-19 Published:2013-04-04
  • Contact: Xu, J.
  • About author:-
  • Supported by:
    -

Abstract: Use the EMD(earth mover's distance) to measure the similarity between two probabilistic records. EMD is robust to outliers and minute probability shifting, but has a cubic time complexity. An algorithm, EMD-k Join, is proposed that speeds up the EMD-based similarity search by constructing an index for probabilistic data using the primal-dual theory in linear programming and thus eliminates unnecessary EMD calculations. Meanwhile, it improves performance of the join process by adopting range query as the main operation and gradually shrinking the search range. Experimental results on real data sets show that EMD-k Join dramatically improves efficiency of EMD-based top-k joins on probabilistic data.

CLC Number: