东北大学学报:自然科学版 ›› 2016, Vol. 37 ›› Issue (7): 931-936.DOI: 10.12068/j.issn.1005-3026.2016.07.005

• 信息与控制 • 上一篇    下一篇

滑动窗口下数据流完全加权最大频繁项集挖掘

王少鹏1, 闻英友1,2, 赵宏1,2   

  1. (1. 东北大学 信息科学与工程学院, 辽宁 沈阳110819; 2. 东北大学 医学影像计算教育部重点实验室, 辽宁 沈阳110819)
  • 收稿日期:2015-05-11 修回日期:2015-05-11 出版日期:2016-07-15 发布日期:2016-07-13
  • 通讯作者: 王少鹏
  • 作者简介:王少鹏(1984-),男,内蒙古乌兰察布人,东北大学博士研究生; 赵宏(1954-),男,辽宁沈阳人,东北大学教授,博士生导师.
  • 基金资助:
    国家自然科学基金资助项目(60903159,61173153,61402096); 中央高校基本科研业务费专项资金资助项目(N110818001,N100218001,N130504007,N120104001); 沈阳市科技计划项目(1091176-1-00); 国家高技术研究发展计划项目(2015AA016005).

Mining Full Weighted Maximal Frequent Itemsets Based on Sliding Window over Data Stream

WANG Shao-peng1, WEN Ying-you1,2, ZHAO Hong1,2   

  1. 1.School of Information Science & Engineering, Northeastern University, Shenyang 110819, China; 2.Key Laboratory of Medical Image Computing, Ministry of Education, Northeastern University, Shenyang 110819, China.
  • Received:2015-05-11 Revised:2015-05-11 Online:2016-07-15 Published:2016-07-13
  • Contact: WANG Shao-peng
  • About author:-
  • Supported by:
    -

摘要: 针对当前关于数据流加权最大频繁项集WMFI(weighted maximal frequent itemsets)的研究无法有效地处理频繁阈值和加权频繁阈值不一致情况下WMFI的挖掘问题,提出了完全加权最大频繁项集FWMFI(full weighted maximal frequent itemsets)的概念.为了减少naive算法在处理滑动窗口下完全加权最大频繁项集挖掘时存在的冗余运算,提出了FWMFI-SW(FWMFI mining based on sliding window over data stream)算法.所提出的算法通过基于频繁约束条件的优化策略减少了naive算法中MaxW优化策略的无效调用次数;采用编辑距离比率作为WMFP-SW-tree的重构判别函数,可以有效减少该树的重构次数.实验结果表明FWMFI-SW算法是有效的,且比naive算法更有时间优势.

关键词: 数据流, 滑动窗口, 编辑距离比率, 加权最大频繁项集, 重构判别函数

Abstract: Aiming at the problem that none of current researches on the WMFI (weighted maximal frequent itemsets) over data stream emphasizes the WMFI mining on the condition that the frequent threshold is not equal with the weighted frequent threshold, the concept of FWMFI (full weighted maximal frequent itemsets) was firstly promoted in this work. In order to reduce redundant operations existing in the naive algorithm which is used to handle the FWMFI mining based on sliding window over data stream, the FWMFI-SW (FWMFI mining based on sliding window over data stream) algorithm was proposed. The mining optimization strategy was adopted based on the frequent character to reduce the unnecessary call about the MaxW optimization strategy in the naive algorithm. In addition, the edit distance ratio was taken as reconstruction judge function to decide whether the updated WMFP-SW-tree should be reconstructed as the window slides. The extensive experiments showed that the FWMFI-SW algorithm is effective , and outperforms the naive algorithm in running time.

Key words: data stream, sliding window, edit distance ratio, weighted maximal frequent itemsets, reconstruction judge function

中图分类号: