东北大学学报(自然科学版) ›› 2022, Vol. 43 ›› Issue (5): 639-645.DOI: 10.12068/j.issn.1005-3026.2022.05.005

• 信息与控制 • 上一篇    下一篇

基于深度嵌入聚类的ICU患者生理数据缺失插补

李建华1, 朱泽阳1, 徐礼胜1,2, 孙国哲3   

  1. (1. 东北大学 医学与生物信息工程学院, 辽宁 沈阳110169; 2. 沈阳东软智能医疗科技研究院有限公司, 辽宁 沈阳110167; 3. 中国医科大学附属第一医院 心血管内科, 辽宁 沈阳110001)
  • 修回日期:2021-07-13 接受日期:2021-07-13 发布日期:2022-06-20
  • 通讯作者: 李建华
  • 作者简介:李建华(1973-),男,河北怀来人,东北大学讲师; 徐礼胜(1975-),男,安徽安庆人,东北大学教授,博士生导师.
  • 基金资助:
    国家自然科学基金资助项目(61773110); 中央高校基本科研业务费专项资金资助项目(N2119007,N2119008); 沈阳市科学技术计划基金资助项目(20201410); 沈阳东软智能医疗科技研究院有限公司会员课题项目(MCMP062002).

Interpolation of Missing Physiological Data of ICU Patients Based on Deep Embedded Clustering

LI Jian-hua1, ZHU Ze-yang1, XU Li-sheng1,2, SUN Guo-zhe3   

  1. 1. School of Medicine and Biological Information Engineering, Northeastern University, Shenyang 110169, China; 2. Neusoft Research of Intelligent Healthcare Technology, Co.,Ltd., Shenyang 110167, China; 3. Department of Cardiovascular Medicine, The First Hospital of China Medical University, Shenyang 110001, China.
  • Revised:2021-07-13 Accepted:2021-07-13 Published:2022-06-20
  • Contact: XU Li-sheng
  • About author:-
  • Supported by:
    -

摘要: 电子病历数据经常存在缺失,严重影响分析结果.基于MIMIC数据库中的重症监护单元(intensive care unit,ICU)患者数据研究缺失值插补,数据集由23组临床常用生理变量以及不存在缺失的5260例样本构成.提出了一种基于深度嵌入聚类的K近邻插值方法.该方法以深度嵌入聚类为核心,通过多次聚类构造样本邻近度矩阵,再选择缺失样本的K个近邻样本,以这些近邻样本的平均值填补缺失.与均值插补、中值插补、后验分布估算插补和条件均值插补相比,该方法插补后的结果与原数据相似度更高,且更好地保留了样本间的差异性.

关键词: 重症监护单元;电子病历;缺失值插补;深度嵌入聚类;邻近度矩阵

Abstract: The data in electronic medical records are often missing, significantly affecting the analysis results. The ICU(intensive care unit)patients’ data in MIMIC database were analyzed for missing value interpolation, and the dataset consists of 23 groups of commonly used clinical physiological variables and 5260 samples without missing values. A K-nearest neighbor interpolation method was proposed based on deep embedded clustering. This method takes deep embedded clustering as the core, constructs the sample proximity matrix through multiple clustering, and then regards the average value of K-nearest neighbor of missing samples as the missing values. Compared with mean interpolation, median interpolation, a posteriori distribution estimation interpolation and conditional mean interpolation, the proposed method obtains higher similarity between the interpolated results and the original data, and better retains the differences between various samples.

Key words: intensive care unit(ICU); electronic medical record; missing value interpolation; deep embedded clustering; proximity matrix

中图分类号: