东北大学学报(自然科学版) ›› 2023, Vol. 44 ›› Issue (1): 33-39.DOI: 10.12068/j.issn.1005-3026.2023.01.005

• 信息与控制 • 上一篇    下一篇

基于深度学习的威胁情报领域命名实体识别

王瀛1,2, 王泽浩3, 李红4, 黄文军4   

  1. (1.河南大学 河南省智能网络理论与关键技术国际联合实验室, 河南 开封475001; 2.河南大学 河南省高等学校学科创新引智基地-河南大学软件工程智能信息处理创新引智基地, 河南 开封475001; 3.河南大学 智能网络系统研究所, 河南 开封475001; 4.中国科学院 信息工程研究所, 北京100049)
  • 发布日期:2023-01-30
  • 通讯作者: 王瀛
  • 作者简介:王瀛(1976-),男,天津人,河南大学副教授.
  • 基金资助:
    河南省自然科学基金资助项目(182300410164); 河南大学研究生教育创新与质量提升计划项目——英才计划(No.SYL19060120); 国家自然科学基金青年基金资助项目(61702503,61802016); 国家自然科学基金重点资助项目(Y810021104).

Named Entity Recognition in Threat Intelligence Domain Based on Deep Learning

WANG Ying1,2, WANG Ze-hao3, LI Hong4, HUANG Wen-jun4   

  1. 1.Henan International Joint Laboratory of Theories and Key Technologies on Intelligence Networks, Henan University, Kaifeng 475001, China; 2.Subject Innovation and Intelligence Introduction Base of Henan Higher Educational Institution -Intelligent Information Processing Innovation and Intelligence Introduction Base of Henan University Software Engineering, Henan University, Kaifeng 475001, China; 3.Institute of Intelligence Networks System, Henan University, Kaifeng 475001, China; 4.Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100049, China.
  • Published:2023-01-30
  • Contact: HUANG Wen-jun
  • About author:-
  • Supported by:
    -

摘要: 为了从来源不同的威胁情报中提取关键信息,方便政府监管部门开展安全风险评估,针对威胁情报文本中英文混杂严重以及专业词汇生僻导致识别困难的问题,在BiGRU-CRF模型基础上,提出了一种融合边界特征以及迭代膨胀卷积神经网络(IDCNN)的威胁情报命名实体识别方法.该方法根据人工构造的规则词典将边界清晰的实体例如英文单词进行转化以减少模型在处理较长文本时容易造成的信息损失,通过IDCNN和双向门控循环单元(BiGRU)进一步提取了文本的局部和全局特征.通过在威胁情报语料库上进行实验,结果表明所提的方法模型在相关评价指标上均优于其他模型,F值达到87.4%.

关键词: 威胁情报;膨胀卷积;命名实体识别; 信息抽取;深度学习

Abstract: In order to extract key information of threat intelligence from different sources and facilitate the government regulatory authorities to carry out security risk assessment, to reduce the difficulty identification caused by the serious mixing of Chinese and English threat intelligence texts and the lack of professional vocabulary, based on BiGRU-CRF model, a threat intelligence named entity recognition(NER)method integrating boundary features and iterated dilated convolution neural network (IDCNN) is proposed. Firstly, entities with clear boundaries, such as English words, are transformed according to the artificially constructed rule dictionary to reduce the loss of information easily caused by the model when processing long texts. The local feature information and the context global feature information are obtained through IDCNN and bidirectional gated recurrent unit (BiGRU), respectively. The results of experiments on threat intelligence corpus show that the proposed model is better than other models in relevant evaluation indexes, and the F-score reaches 87.4%.

Key words: threat intelligence; dilated convolution; named entity recognition (NER); information extraction; deep learning

中图分类号: