Journal of Northeastern University Natural Science ›› 2016, Vol. 37 ›› Issue (9): 1245-1249.DOI: 10.12068/j.issn.1005-3026.2016.09.007

• Information & Control • Previous Articles     Next Articles

Online Classification Algorithm for Uncertain Data Stream in Big Data

LYU Yan-xia, WANG Cui-rong, WANG Cong, YU Chang-yong   

  1. School of Information Science & Engineering, Northeastern University, Shenyang 110819, China.
  • Received:2015-05-24 Revised:2015-05-24 Online:2016-09-15 Published:2016-09-18
  • Contact: LYU Yan-xia
  • About author:-
  • Supported by:
    -

Abstract: Under the background of big data, there exist data uncertainties due to privacy protection, data loss and so on. In data stream system, data arrive at continuously and cannot be obtained all. In addition, all the inforation cannot be aquired with only one scan. Therefore, an incremental classification model is constructed to deal with uncertain data stream classification. The weighted Bayes based on VFDT (very fast decision tree) for uncertain data stream—WBVFDTu on the basis of VFDT algorithm is presented in the paper. The uncertain information can be analysed quickly and effectively in both the learning stage and classification stage. In the learning stage, a decision tree model for uncertain data stream is quickly constructed by using Hoeffding bound theory. In the classification stage, the weighted Bayes classifier in the tree leaves is used to improve the performance of the classification. Experimental results show that the proposed algorithm can very quickly learn uncertain data stream and improve the classification performance of the model.

Key words: uncertain data stream, weighted Bayes, VFDT(very fast decision tree), classification algorithm, big data

CLC Number: