东北大学学报:自然科学版 ›› 2016, Vol. 37 ›› Issue (12): 1683-1687.DOI: 10.12068/j.issn.1005-3026.2016.12.003

• 信息与控制 • 上一篇    下一篇

基于内容相关的条件函数依赖的一致性清洗方法

杜岳峰1, 申德荣1, 张亮2, 于戈1   

  1. (1. 东北大学 信息科学与工程学院, 辽宁 沈阳110819; 2. 中国人民解放军 65154部队, 辽宁 凌源122513)
  • 收稿日期:2015-07-31 修回日期:2015-07-31 出版日期:2016-12-15 发布日期:2016-12-23
  • 通讯作者: 杜岳峰
  • 作者简介:杜岳峰(1986-),男,辽宁沈阳人,东北大学博士研究生; 申德荣(1964-),女,辽宁铁岭人,东北大学教授,博士生导师; 于戈(1962-),男,辽宁大连人,东北大学教授,博士生导师.
  • 基金资助:
    国家重点基础研究发展计划项目(2012CB316201); 国家自然科学基金资助项目(61033007).

A Consistency Cleaning Method Based on Content-related Conditional Functional Dependencies

DU Yue-feng1, SHEN De-rong1, ZHANG Liang2, YU Ge1   

  1. 1. School of Information Science & Engineering, Northeastern University, Shenyang 110819, China; 2. PLA 65154 Troops, Lingyuan 122513, China.
  • Received:2015-07-31 Revised:2015-07-31 Online:2016-12-15 Published:2016-12-23
  • Contact: DU Yue-feng
  • About author:-
  • Supported by:
    -

摘要: 基于条件函数依赖提出了一种内容相关的条件函数依赖,并给出基于内容相关的条件函数依赖的一致性清洗方法.通过分析条件函数依赖之间的关系,将相关联的条件函数依赖合并组成内容相关的条件函数依赖.内容相关的条件函数依赖可以检测多条件值下的数据一致性问题并提供可用于一致性修复的参考值.同时,提出了一种一致性修复的代价模型.模型参考内容相关的条件函数依赖对应元组的实际情况进行修复,实现代价最优,同时保证数据一致性.通过在两组真实数据集上进行试验测试,证明提出的基于内容相关的条件函数依赖的一致性清洗方法能够准确地检测数据的一致性问题并加以修复.

关键词: 数据清洗, 条件函数依赖, 内容相关, 数据一致性, 修复代价模型

Abstract: Based on conditional functional dependencies, content-related conditional functional dependencies (CCFDs) and the consistency cleaning method were presented based on CCFDs. By analyzing the relationship of the conditional functional dependencies, the related conditional functional dependencies were combined into CCFDs. The CCFDs can not only detect the consistencies under multi-conditional values, but also provide reference values for the consistency repairing. A consistency repairing-cost model was presented. Then the data was corrected to be consistent with the minimal repairing cost according to the actual data. And the repaired results are approved accuracy for both the inconsistency detection and the inconsistency repairing via the experimental evaluation on two real-life datasets.

Key words: data cleaning, conditional functional dependency, content relativity, data consistency, repairing-cost model

中图分类号: