Journal of Northeastern University Natural Science ›› 2020, Vol. 41 ›› Issue (11): 1521-1527.DOI: 10.12068/j.issn.1005-3026.2020.11.001

• Information & Control •     Next Articles

An Applicable Multivariate Decision Tree Algorithm for Categorical Attribute Data

LIU Zhen-yu1,2, SONG Xiao-ying2   

  1. 1. Software Center, Northeastern University, Shenyang 110819, China; 2. Key Laboratory of Network Security and Computing Technology, Dalian Neusoft University of Information, Dalian 116023, China.
  • Received:2019-10-24 Revised:2019-10-24 Online:2020-11-15 Published:2020-11-16
  • Contact: LIU Zhen-yu
  • About author:-
  • Supported by:
    -

Abstract: Most multivariate decision trees are applicable for only the numerical data. To solve the classification problem on categorical attribute data, an applicable multivariate decision tree(CMDT) algorithm is proposed. The center of the sample set on the categorical attributes, and the distance between the samples and the centers are defined with statistics for frequency distribution of categorical attribute values in each category or each cluster. Weighted k-means algorithm is utilized to split the nodes in the decision tree. The proposed multivariate decision tree is applicable for numerical data, categorical data, and mixed data. Experiment results show that the classification model based on the proposed algorithm can get more concise tree construction and higher generalization accuracy than that based on the classic decision tree algorithms with different kinds of data.

Key words: decision tree, categorical attribute, multivariate decision tree, node split, k-means

CLC Number: