交互式数据探索框架的特征自适应技术

doi:10.12068/j.issn.1005-3026.2018.12.003

东北大学学报:自然科学版 ›› 2018, Vol. 39 ›› Issue (12): 1685-1690.DOI: 10.12068/j.issn.1005-3026.2018.12.003

交互式数据探索框架的特征自适应技术

王蒙湘，李芳芳，于戈

(东北大学计算机科学与工程学院，辽宁沈阳110169)

收稿日期:2017-05-24 修回日期:2017-05-24 出版日期:2018-12-15 发布日期:2018-12-19
通讯作者: 王蒙湘
作者简介:王蒙湘(1991-)，女，内蒙古赤峰人，东北大学博士研究生；于戈(1962-)，男，辽宁大连人，东北大学教授，博士生导师.冯明杰(1971-)，男，河南禹州人，东北大学副教授; 王恩刚(1962-)，男，辽宁沈阳人，东北大学教授，博士生导师.
基金资助:
国家自然科学基金资助项目(51171041).国家自然科学基金资助项目(61472071)；中央高校基本科研业务费专项资金资助项目(N161604005)；辽宁省自然科学基金资助项目(2015020018).

Feature Adaptive Technology in Interactive Data Exploration Framework

WANG Meng-xiang， LI Fang-fang， YU Ge

School of Computer Science & Engineering， Northeastern University， Shenyang 110169， China.

Received:2017-05-24 Revised:2017-05-24 Online:2018-12-15 Published:2018-12-19
Contact: YU Ge
About author:-
Supported by:
-

摘要/Abstract

摘要： 交互式数据探索是一组多样的发现式应用程序的关键技术，着重于交互、探索和发现;在许多场景和领域中广泛应用.以海量的学术文献数据探索为背景，对交互式数据探索的特征自适应技术进行研究.首先，提出一种适用于面向学术文献数据探索的特征自适应交互式数据探索框架FA-IDE(feature-adaptive interactive data exploration)，在每次迭代过程中动态地调整特征子集，以满足用户兴趣多样性的需求.其次，针对该框架，提出特征子集的均匀度BFS(balance of feature subsets)评价准则，并给出了基于BFS的序列前向特征选择算法.再次，针对相关样本发现问题，提出划分等级建立方法，根据决策树模型对用户兴趣区域划分后，提出基于相似度的结果集排序策略.实验结果表明，所提出方法可有效提高用户探索效率和最终结果的准确性.

关键词: 交互式数据探索, 主题提取, 特征选择, 样本发现, 机器学习

Abstract: Interactive data exploration(IDE)is a key technique in a diverse set of discovery-based applications， which focuses on interaction， exploration and discovery and has a wide range of applications in many scenes and areas. The feature adaptive technology of interactive data exploration was studied in this paper with the background of massive academic literature data exploration. Firstly， a framework of interactive data exploration was presented， namely FA-IDE(feature-adaptive interactive data exploration) framework， which can dynamically adjust the subset of features during each iteration to meet the needs of the user′s interest diversity. Secondly， according to this framework， the evaluation criteria of the balance of feature subsets(BFS) were proposed in the stage of exploration and a sequence forward feature selection algorithm based on BFS was also given. Besides， for the phases of related sample discovery， a division level establishment method was proposed. According to the decision tree model which can divide the user interest area， a strategy of result set sorting based on similarity was proposed.The results of experiments show that the accuracy and efficiency of the proposed method have been effectively improved.

Key words: interactive data exploration, topic extraction, feature selection, sample discovery, machine learning

中图分类号:

TP315

王蒙湘，李芳芳，于戈. 交互式数据探索框架的特征自适应技术[J]. 东北大学学报:自然科学版, 2018, 39(12): 1685-1690.

WANG Meng-xiang， LI Fang-fang， YU Ge. Feature Adaptive Technology in Interactive Data Exploration Framework[J]. Journal of Northeastern University Natural Science, 2018, 39(12): 1685-1690.

参考文献

[1]王蒙湘，李芳芳，谷峪，等.交互式数据探索综述［J］.计算机科学与探索，2017，11(2):171-184.(Wang Meng-xiang，Li Fang-fang，Gu Yu，et al.Survey on interactive data exploration ［J］.Computer Science and Exploration，2017，11(2):171-184.)
[2]Ellermann J，Dorn K.Explore-by-example:an automatic query steering framework for interactive data exploration［C］ // ACM SIGMOD International Conference on Management of Data.Snowbird，2014:517-528.
[3]Dimitriadou K，Papaemmanouil O，Diao Y.Interactive data exploration based on user relevance feedback［C］// IEEE International Conference on Data Engineering.Atlanta，2014:292-295.
[4]Du X Y，Chen J，Chen Y.Research on big data exploration ［J］.Journal of Communication，2015，36(12):77-88.
[5]Dimitriadou K，Pappaemmanouil O，Diao Y.AIDE:an active learning-based approach for interactive data exploration ［J］.IEEE Transactions on Knowledge & Data Engineering，2016，28(11):2842-2856.
[6]Kamat N，Jayachandran P，Tunga K，et al.Distributed and interactive cube exploration［C］// IEEE International Conference on Data Engineering.Atlanta，2014:472-483.
[7]Agarwal S，Iyer A P，Panda A，et al.Blink and it′s done:Interactive queries on very large data ［J］.Proceedings of the VLDB Endowment，2012，5(12):1902-1905.
[8]Jiang L，Nandi A.SnapToQuery:providing interactive feedback during exploratory query specification ［C］// Proceedings of the VLDB Endowment.Kohala Coast，2015:1250-1261.
[9]Newman D，Asuncion A U，Smyth P，et al.Distributed inference for latent Dirichlet allocation［C］// Conference on Neural Information Processing Systems.Vancouver，2007:1-6.
[10]谢娟英，谢维信.基于特征子集区分度与支持向量的特征选择算法［J］.计算机学报，2014，37(8):1704-1718.(Xie Juan-ying，Xie Wei-xin.Feature selection algorithm based on feature subset identity and support vector machine ［J］.Journal of Computers，2014，37(8):1704-1718.)
[11]Griffiths T L，Steyvers M.Finding scientific topics［J］.Proceedings of the National Academy of Sciences of the United States of America，2004，101(sup1):5228-5239.

[1]	李占山，杨云凯，张家晨. 基于熵权法的过滤式特征选择算法[J]. 东北大学学报（自然科学版）, 2022, 43(7): 921-929.
[2]	赵海，陈佳伟，施瀚，王相. 一种应用于人体活动识别的迁移学习算法[J]. 东北大学学报（自然科学版）, 2022, 43(6): 776-782.
[3]	杨爱萍，宋尚阳，程思萌. 轻量化自适应特征选择目标检测网络[J]. 东北大学学报（自然科学版）, 2021, 42(9): 1238-1245.
[4]	丁敬国，郭锦华. 基于主成分分析协同随机森林算法的热连轧带钢宽度预测[J]. 东北大学学报（自然科学版）, 2021, 42(9): 1268-1275.
[5]	马海涛，路家蕊，于文鑫，于长永. 线性区域数量与PLNN表达能力的相关性[J]. 东北大学学报（自然科学版）, 2021, 42(2): 201-207.
[6]	李占山，姚鑫，刘兆赓，张家晨. 基于LightGBM的特征选择算法[J]. 东北大学学报（自然科学版）, 2021, 42(12): 1688-1695.
[7]	李壮年，储满生，柳政根，李宝峰. 基于机器学习和遗传算法的高炉参数预测与优化[J]. 东北大学学报:自然科学版, 2020, 41(9): 1262-1267.
[8]	杨望，江咏涵，张三峰. 基于网页结构与语言特征的垃圾网页链接检测方法[J]. 东北大学学报:自然科学版, 2020, 41(8): 1091-1096.
[9]	崔雪婷，李颖，范嘉豪. 全局混沌蝙蝠优化算法[J]. 东北大学学报:自然科学版, 2020, 41(4): 488-492.
[10]	李占山，吕艾娜. 基于新冗余度的特征选择方法[J]. 东北大学学报:自然科学版, 2020, 41(11): 1550-1556.
[11]	李占山，刘兆赓，俞寅，鄢文浩. 量子化信息素蚁群优化特征选择算法[J]. 东北大学学报:自然科学版, 2020, 41(1): 17-22.
[12]	郭甲腾，刘寅贺，韩英夫，王徐磊. 基于机器学习的钻孔数据隐式三维地质建模方法[J]. 东北大学学报:自然科学版, 2019, 40(9): 1337-1342.
[13]	朱继召，乔建忠，林树宽. 表示学习知识图谱的实体对齐算法[J]. 东北大学学报:自然科学版, 2018, 39(11): 1535-1539.
[14]	郭强，吴成东，赵迎春. 基于在线判别分布域特征选择的鲁棒跟踪算法[J]. 东北大学学报:自然科学版, 2017, 38(3): 305-309.
[15]	王彦华，乔建忠，林树宽，赵廷磊. 基于SVM的CPU-GPU异构系统任务分配模型[J]. 东北大学学报:自然科学版, 2016, 37(8): 1089-1094.

交互式数据探索框架的特征自适应技术

Feature Adaptive Technology in Interactive Data Exploration Framework

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价