Journal of Northeastern University(Natural Science) ›› 2023, Vol. 44 ›› Issue (11): 1537-1542.DOI: 10.12068/j.issn.1005-3026.2023.11.003

• Information & Control • Previous Articles     Next Articles

Speech Emotion Recognition Based on Constrained Bi-channel Model

SUN Ying, LI Ze, ZHANG Xue-ying   

  1. College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China.
  • Published:2023-12-05
  • Contact: ZHANG Xue-ying
  • About author:-
  • Supported by:
    -

Abstract: To address the problem of insufficient speech features in speech emotion recognition, a constrained bi-channel model is proposed to fully exploit the emotional information contained in speech features from both global and local aspects, thereby improving the emotion recognition rate. In channel 1, the gated recurrent unit(GRU) was introduced and improved to capture the global information of speech features, and a BAGRU (bidirectional attention gate recurrent unit) model was constructed to improve the correlation between speech features. In channel 2, a convolutional neural network was employed to capture the local information of speech features and adversarial training was added to avoid mutual interference of local information. The bi-channel fusion model automatically generates different weights on the importance of channel features, and the orthogonal constraint is introduced to address the problem of feature redundancy in the bi-channel fusion. Experimental results show that the proposed model achieves recognition accuracies of 62.83% and 82.19% on two common emotional corpus, namely IEMOCAP and EMO-DB. The constrained bi-channel model has better performance in speech emotion recognition tasks.

Key words: speech emotion recognition; gated recurrent unit(GRU); convolutional neural network; orthogonal constraint

CLC Number: