Journal of Northeastern University Natural Science ›› 2017, Vol. 38 ›› Issue (10): 1373-1377.DOI: 10.12068/j.issn.1005-3026.2017.10.002

• Information & Control • Previous Articles     Next Articles

Joint Truth Finding on Heterogeneous Data

CHEN Chao1,2, SHEN De-rong1, KOU Yue1, YU Ge1   

  1. 1. School of Computer Science & Engineering, Northeastern University, Shenyang 110169, China; 2. College of Information Science & Technology, Bohai University, Jinzhou 121007, China.
  • Received:2016-05-04 Revised:2016-05-04 Online:2017-10-15 Published:2017-10-13
  • Contact: SHEN De-rong
  • About author:-
  • Supported by:
    -

Abstract: The value of an entity attribute on the web is usually provided by multiple data sources, but the values provided by them are not always the same, which affects the effective integration of data, so it is necessary to find out the true value among these given values. The existing truth finder algorithms mainly focus on the single type data kind, so a distance-based truth finding algorithm was proposed by considering heterogeneous data jointly. Firstly, for a specific data item, the data item vectors were calculated on the basis of the distance between the claimed value from every source and the truth value. The KMeans algorithm was used to get initial clustering. Then, alternate clustering and trust analysis were iteratively performed, i.e., within each cluster, confidence of facts and trustworthiness of sources were updated with the idea of optimization and joint heterogeneous data. Each data item vector was recalculated and reclustered, and when each cluster was stable, the iteration would be terminated. The experiment results showed that the proposed algorithm has a higher accuracy for truth finding because of the fine grained partition of source quality and the joint model of heterogeneous data.

Key words: truth, truth finding, KMeans clustering, optimization, heterogeneous data

CLC Number: