基于CGRU模型的语音情感识别研究与实现

doi:10.12068/j.issn.1005-3026.2020.12.002

东北大学学报:自然科学版 ›› 2020, Vol. 41 ›› Issue (12): 1680-1685.DOI: 10.12068/j.issn.1005-3026.2020.12.002

基于CGRU模型的语音情感识别研究与实现

郑艳，陈家楠，吴凡，付彬

(东北大学信息科学与工程学院，辽宁沈阳110819)

收稿日期:2020-02-05 修回日期:2020-02-05 出版日期:2020-12-15 发布日期:2020-12-22
通讯作者: 郑艳
作者简介:郑艳(1963-)，女，辽宁沈阳人，东北大学副教授，博士.
基金资助:
国家自然科学基金资助项目(61773108).

Research and Implementation of Speech Emotion Recognition Based on CGRU Model

ZHENG Yan， CHEN Jia-nan， WU Fan， FU Bin

School of Information Science & Engineering， Northeastern University， Shenyang 110819， China.

Received:2020-02-05 Revised:2020-02-05 Online:2020-12-15 Published:2020-12-22
Contact: CHEN Jia-nan
About author:-
Supported by:
-

摘要/Abstract

摘要： 语音情感识别是人机交互、情感计算中重要的研究方向.目前普遍使用深度神经网络用于语音情感特征的提取，但使用哪种神经网络模型、如何缓解模型过拟合问题还需进一步研究.针对这些问题，提出了一种结合一维卷积(CNN)以及门控循环单元(GRU)的CGRU模型，从原始语音信号的MFCC特征中提取语音的低阶以及高阶情感特征，并通过随机森林对其进行特征选择，在三种公用的情感语料库EMODB，SAVEE，RAVDESS上分别取得了79%，69%以及75%的识别精度.通过添加高斯噪声及改变速度等方法来增加样本量实现数据扩充，进一步提高了识别精度.通过在线识别系统验证了模型在实际环境中的可用性.

关键词: 语音情感识别, 梅尔频率倒谱系数, CGRU模型, 随机森林, 数据扩充

Abstract: Speech emotion recognition is a very important research direction in emotion computing and human-computer interaction. At present， deep neural network is widely used to extract emotional features of speech， but further research is needed on which neural network model to use and how to alleviate the problem of model overfitting. To solve these problems， a CGRU model was proposed， which combined one dimensional convolutional neural networks (CNN) and gated circulation unit (GRU). The low-order and high-order emotional features of speech were extracted from the MFCC features of the original speech signal， and the features were selected through random forest， which achieved 79%， 69% and 75% recognition accuracy respectively on three common emotional corpus: EMODB， SAVEE， RAVDESS. By using the data augmentation technique， the sample size was increased by adding gaussian noise and changing the speed， which further improved the identification accuracy. The availability of the model in the real world was verified through the online identification system.

Key words: speech emotion recognition, Mel-frequency cepstral coefficients, CGRU model, random forest, data augmentation

中图分类号:

TN912.3

郑艳，陈家楠，吴凡，付彬. 基于CGRU模型的语音情感识别研究与实现[J]. 东北大学学报:自然科学版, 2020, 41(12): 1680-1685.

ZHENG Yan， CHEN Jia-nan， WU Fan， FU Bin. Research and Implementation of Speech Emotion Recognition Based on CGRU Model[J]. Journal of Northeastern University Natural Science, 2020, 41(12): 1680-1685.

参考文献

[1]Picard R W.Affective computing［M］.Cambridge，MA:MIT Press，1997:14-16.
[2]Kim Y，Lee H，Provost E M.Deep learning for robust feature generation in audiovisual emotion recognition［C］//Proceedings of IEEE International Conference on Acoustics，Speech and Signal Processing.Vancouver，2013:3687-3691.
[3]Deng J，Zhang Z，Marchi E，et al.Sparse autoencoder-based feature transfer learning for speech emotion recognition［C］//2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.Geneva，2013:511-516.
[4]Lee J，Tashev I.High-level feature representation using recurrent neural network for speech emotion recognition［J］.Interspeech，2015，5(1):10-13.
[5]LeCun Y，Bottou L，Bengio Y，et al.Gradient-based learning applied to document recognition［J］.Proceedings of the IEEE，1998，86(11):2278-2324.
[6]LeCun Y，Bengio Y.Convolutional networks for images，speech，and time series［M］//Arbib M A ed.The handbook of brain theory and neural networks.Cambridge:MIT Press，1995:255-257.
[7]Likitha M S，Gupta S R R，Hasitha K，et al.Speech based human emotion recognition using MFCC［C］//2017 International Conference on Wireless Communications，Signal Processing and Networking (WiSPNET).Chennai，2017:2257-2260.
[8]Burkhardt F，Paeschke A，Rolfes M，et al.A database of German emotional speech［C］//Proceedings of Interspeech 2005.Lisbon:ISCA，2005:1517-1520.
[9]Jackson P，Haq S.Surrey audio-visual expressed emotion (SAVEE) database［EB/OL］.［2015-01-05］.http://kahlan.eps.surrey.ac.uk/savee/.
[10]Livingstone S R，Russo F A，Joseph N.The Ryerson audio-visual database of emotional speech and song:a dynamic，multimodal set of facial and vocal expressions in North American English［J］.PLOS ONE，2001，13(5):15-19.
[11]Pandey S K，Shekhawat H S，Prasanna S R M.Deep learning techniques for speech emotion recognition:a review［C］//29th IEEE International Conference Radioelektronika.Pardubice，2019:1-6.

基于CGRU模型的语音情感识别研究与实现

Research and Implementation of Speech Emotion Recognition Based on CGRU Model

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 7

编辑推荐

Metrics

本文评价

[1]	李鸿儒，任子洋，黄友鹤，于霞. 基于变权重奇异谱分析的心律不齐识别方法[J]. 东北大学学报（自然科学版）, 2022, 43(3): 305-312.
[2]	丁敬国，郭锦华. 基于主成分分析协同随机森林算法的热连轧带钢宽度预测[J]. 东北大学学报（自然科学版）, 2021, 42(9): 1268-1275.
[3]	刘馨，张卫军，石泉，周乐. 基于数据挖掘与清洗的高炉操作参数优化[J]. 东北大学学报:自然科学版, 2020, 41(8): 1153-1160.
[4]	徐礼胜，张闻勖，庞宇轩，吴承暘. 基于短时心电信号的疲劳驾驶检测算法[J]. 东北大学学报:自然科学版, 2019, 40(7): 937-941.
[5]	纪英俊，勇晓玥，刘英林，刘士新. 基于随机森林的热轧带钢质量分析与预测方法[J]. 东北大学学报:自然科学版, 2019, 40(1): 11-15.
[6]	董立岩，王越群，李永丽，朱琪. 基于最大平衡度的自适应随机抽样算法[J]. 东北大学学报:自然科学版, 2018, 39(6): 792-796.
[7]	王鑫，汪晋宽，刘志刚，胡曦. 基于随机森林的认知网络主用户信号调制类型识别算法[J]. 东北大学学报:自然科学版, 2014, 35(12): 1706-1709.