Journal of Northeastern University Natural Science ›› 2016, Vol. 37 ›› Issue (12): 1677-1682.DOI: 10.12068/j.issn.1005-3026.2016.12.002

• Information & Control • Previous Articles     Next Articles

A Cluster Algorithm for Uncertain Data Stream

HAN Dong-hong1, WANG Kun1, SHAO Chong-lei2, MA Chang1   

  1. 1. School of Computer Science & Engineering, Northeastern University, Shenyang 110169, China; 2. School of Mechanical Engineering, Shenyang Ligong University,Shenyang 110159, China.
  • Received:2015-08-28 Revised:2015-08-28 Online:2016-12-15 Published:2016-12-23
  • Contact: HAN Dong-hong
  • About author:-
  • Supported by:
    -

Abstract: As an important component of big data generated in the sensor, mobile phone devices, social networks etc., uncertain streaming data have many characteristics, such as variable rate, large-scale, single-pass scanning, and uncertainty. Traditional clustering algorithms cannot meet efficient real-time inquiry requirements for the users. Firstly, MBR (minimum bounding rectangle) was used to describe the distribution characteristics of uncertain tuples. And then, a clustering algorithm based on expected distance was proposed for uncertain data stream. The bounds of expected distance range to filter the clusters with far distance can be calculated. Secondly, cluster MBR concept based on the distribution of the tuples in a cluster was presented. Then, a clustering algorithm was given, which excludes the clusters far from the uncertain tuple by the spatial location relationship between uncertainty tuple MBR and clusters MBR, thereby increasing the efficiency of clustering algorithm. Finally, experiments running on synthetic datasets and real datasets verify that the proposed algorithms are effective and efficient.

Key words: uncertain data stream, cluster, big data, data mining, MBR (minimum bounding rectangle)

CLC Number: