Journal of Northeastern University Natural Science ›› 2015, Vol. 36 ›› Issue (1): 19-23.DOI: 10.12068/j.issn.1005-3026.2015.01.005

• Information & Control • Previous Articles     Next Articles

An Adaptive Clustering Method on Medical Short Text

LI Wei1, XU Hong-tao2, ZHAO Da-zhe1,3, LIU Ji-ren3   

  1. 1. Key Laboratory of Medical Image Computing, Ministry of Education, Northeastern University, Shenyang 110819, China; 2. The Zhengzhou Municipal Human Resources and Social Security Data Management Center, Zhengzhou 450000, China; 3. Neusoft Group Ltd., Shenyang 110179, China.
  • Received:2013-12-05 Revised:2013-12-05 Online:2015-01-15 Published:2014-11-07
  • Contact: LI Wei
  • About author:-
  • Supported by:
    -

Abstract: An adaptive clustering method on short text was presented for synonyms text recognition and disease naming standardization of diagnosis in electronic medical record. Firstly, a new set based text similarity measure algorithm was proposed. Then, a similarity distribution based text clustering algorithm which could automatically determine the number of clusters was applied to recognize the synonymous disease texts. Finally, the disease naming texts were standardized by the central concept extraction algorithm based on frequent sequence pattern, while clusters were merged and optimized to further improve the clustering accuracy. The results showed that the proposed approach has a high accuracy and clustering efficiency which is of great significance for medical application such as medical text preprocessing, classification and analysis.

Key words: clustering analysis, similarity measurement, frequent sequence pattern, electronic medical record, similarity distribution

CLC Number: