Journal of Northeastern University Natural Science ›› 2020, Vol. 41 ›› Issue (12): 1680-1685.DOI: 10.12068/j.issn.1005-3026.2020.12.002

• Information & Control • Previous Articles     Next Articles

Research and Implementation of Speech Emotion Recognition Based on CGRU Model

ZHENG Yan, CHEN Jia-nan, WU Fan, FU Bin   

  1. School of Information Science & Engineering, Northeastern University, Shenyang 110819, China.
  • Received:2020-02-05 Revised:2020-02-05 Online:2020-12-15 Published:2020-12-22
  • Contact: CHEN Jia-nan
  • About author:-
  • Supported by:
    -

Abstract: Speech emotion recognition is a very important research direction in emotion computing and human-computer interaction. At present, deep neural network is widely used to extract emotional features of speech, but further research is needed on which neural network model to use and how to alleviate the problem of model overfitting. To solve these problems, a CGRU model was proposed, which combined one dimensional convolutional neural networks (CNN) and gated circulation unit (GRU). The low-order and high-order emotional features of speech were extracted from the MFCC features of the original speech signal, and the features were selected through random forest, which achieved 79%, 69% and 75% recognition accuracy respectively on three common emotional corpus: EMODB, SAVEE, RAVDESS. By using the data augmentation technique, the sample size was increased by adding gaussian noise and changing the speed, which further improved the identification accuracy. The availability of the model in the real world was verified through the online identification system.

Key words: speech emotion recognition, Mel-frequency cepstral coefficients, CGRU model, random forest, data augmentation

CLC Number: