Journal of Northeastern University(Natural Science) ›› 2024, Vol. 45 ›› Issue (1): 40-48.DOI: 10.12068/j.issn.1005-3026.2024.01.006
• Information & Control • Previous Articles Next Articles
Ying SUN, Ya-ru ZHOU, Xue-ying ZHANG
Received:
2022-07-22
Online:
2024-01-15
Published:
2024-04-02
CLC Number:
Ying SUN, Ya-ru ZHOU, Xue-ying ZHANG. Speech Emotion Recognition Fusing Functional Paralanguage Proportion Coefficient[J]. Journal of Northeastern University(Natural Science), 2024, 45(1): 40-48.
比例特征 | 统计特征 |
---|---|
持续时间 | 最大值/最小值 |
最大值位置/最小值位置 | |
第一/二/三/分位数 | |
均值 | |
平均绝对偏差 | |
频次 | 标准偏差 |
偏度、峰度、方差 |
Table 1 Functional paralanguage proportion coefficient
比例特征 | 统计特征 |
---|---|
持续时间 | 最大值/最小值 |
最大值位置/最小值位置 | |
第一/二/三/分位数 | |
均值 | |
平均绝对偏差 | |
频次 | 标准偏差 |
偏度、峰度、方差 |
模型选择 | 优点 | |
---|---|---|
基模型 | KNN | 时间复杂度低 |
RF | 抗过拟合能力强 | |
GBDT | 适合低维数据 | |
Adaboost | 精度高 | |
Extra Trees | 泛化能力好 | |
LightGBM | 训练速度快 | |
元模型 | SVM | 非线性映射、泛化好 |
Table 2 Selection of base model and meta model
模型选择 | 优点 | |
---|---|---|
基模型 | KNN | 时间复杂度低 |
RF | 抗过拟合能力强 | |
GBDT | 适合低维数据 | |
Adaboost | 精度高 | |
Extra Trees | 泛化能力好 | |
LightGBM | 训练速度快 | |
元模型 | SVM | 非线性映射、泛化好 |
情感 | 功能性副语言 | |||
---|---|---|---|---|
笑声 | 叫喊声 | 叹息声 | 抽泣声 | |
愤怒 | 13 | 4 | 29 | 2 |
沮丧 | 18 | 25 | 23 | 9 |
高兴 | 108 | 19 | 6 | 0 |
中性 | 39 | 43 | 27 | 1 |
悲伤 | 10 | 5 | 44 | 102 |
惊喜 | 59 | 18 | 34 | 17 |
Table 3 Distribution of functional paralinguistics in NNIME
情感 | 功能性副语言 | |||
---|---|---|---|---|
笑声 | 叫喊声 | 叹息声 | 抽泣声 | |
愤怒 | 13 | 4 | 29 | 2 |
沮丧 | 18 | 25 | 23 | 9 |
高兴 | 108 | 19 | 6 | 0 |
中性 | 39 | 43 | 27 | 1 |
悲伤 | 10 | 5 | 44 | 102 |
惊喜 | 59 | 18 | 34 | 17 |
模型 | 参数 | 数值或描述 |
---|---|---|
KNN | K-近邻点 | 5 |
RF | 决策树数量 最大特征数 分支所需最小样本数 | 400 15 5 |
GBDT | 学习率 最大深度 最大特征数 | 0.1 10 10 |
Adaboost | 决策树数量 最大深度 | 200 13 |
ExtRa Tree | 决策树数量 分支所需最小样本数 | 400 10 |
LightGBM | 最小损失减少值 最小样本权重和 | 0.37 0.0001 |
SVM | 核函数 核函数系数 核的阶数 | Poly 0.3 3 |
Xgboost | 最大树深度 学习率 决策树数量 特征采样的比例 | 5 0.1 200 0.75 |
Table 4 Grid search results of main parameters of each model
模型 | 参数 | 数值或描述 |
---|---|---|
KNN | K-近邻点 | 5 |
RF | 决策树数量 最大特征数 分支所需最小样本数 | 400 15 5 |
GBDT | 学习率 最大深度 最大特征数 | 0.1 10 10 |
Adaboost | 决策树数量 最大深度 | 200 13 |
ExtRa Tree | 决策树数量 分支所需最小样本数 | 400 10 |
LightGBM | 最小损失减少值 最小样本权重和 | 0.37 0.0001 |
SVM | 核函数 核函数系数 核的阶数 | Poly 0.3 3 |
Xgboost | 最大树深度 学习率 决策树数量 特征采样的比例 | 5 0.1 200 0.75 |
特征 | 评价标准 | SVM | Xgboost | Stacking | ATT_Stacking | 改进增幅 | ||
---|---|---|---|---|---|---|---|---|
IS09 | Precision | 40.00 | 74.70 | 59.46 | 66.19 | +26.19 | -8.51 | +6.73 |
Recall | 44.44 | 66.94 | 60.23 | 65.28 | +20.84 | -1.66 | +5.05 | |
F1 | 40.36 | 66.44 | 58.84 | 64.90 | +24.54 | -1.54 | +6.06 | |
Acc | 48.07 | 70.00 | 61.50 | 63.33 | +15.26 | -6.67 | +1.83 | |
FPPC | Precision | 43.67 | 53.67 | 50.20 | 52.00 | +10.54 | +0.54 | +4.01 |
Recall | 45.40 | 54.40 | 49.61 | 53.64 | +8.24 | -0.76 | +4.03 | |
F1 | 40.36 | 52.30 | 47.90 | 53.16 | +12.8 | +0.86 | +5.26 | |
Acc | 47.96 | 52.96 | 53.33 | 55.67 | +7.71 | +2.71 | +2.34 | |
IS09+FPPC | Precision | 53.71 | 55.54 | 69.44 | 66.90 | +13.19 | +11.36 | -2.54 |
Recall | 56.16 | 66.50 | 70.14 | 72.92 | +16.76 | +6.42 | +2.78 | |
F1 | 54.58 | 57.68 | 68.36 | 73.15 | +18.57 | +15.47 | +4.79 | |
Acc | 57.69 | 64.44 | 73.33 | 76.67 | +18.98 | +12.23 | +3.34 |
Table 5 Recognition results of different combinations of features and classification methods
特征 | 评价标准 | SVM | Xgboost | Stacking | ATT_Stacking | 改进增幅 | ||
---|---|---|---|---|---|---|---|---|
IS09 | Precision | 40.00 | 74.70 | 59.46 | 66.19 | +26.19 | -8.51 | +6.73 |
Recall | 44.44 | 66.94 | 60.23 | 65.28 | +20.84 | -1.66 | +5.05 | |
F1 | 40.36 | 66.44 | 58.84 | 64.90 | +24.54 | -1.54 | +6.06 | |
Acc | 48.07 | 70.00 | 61.50 | 63.33 | +15.26 | -6.67 | +1.83 | |
FPPC | Precision | 43.67 | 53.67 | 50.20 | 52.00 | +10.54 | +0.54 | +4.01 |
Recall | 45.40 | 54.40 | 49.61 | 53.64 | +8.24 | -0.76 | +4.03 | |
F1 | 40.36 | 52.30 | 47.90 | 53.16 | +12.8 | +0.86 | +5.26 | |
Acc | 47.96 | 52.96 | 53.33 | 55.67 | +7.71 | +2.71 | +2.34 | |
IS09+FPPC | Precision | 53.71 | 55.54 | 69.44 | 66.90 | +13.19 | +11.36 | -2.54 |
Recall | 56.16 | 66.50 | 70.14 | 72.92 | +16.76 | +6.42 | +2.78 | |
F1 | 54.58 | 57.68 | 68.36 | 73.15 | +18.57 | +15.47 | +4.79 | |
Acc | 57.69 | 64.44 | 73.33 | 76.67 | +18.98 | +12.23 | +3.34 |
分类方法 | Precision | Recall | F1 | Acc | 改进增幅 |
---|---|---|---|---|---|
GBDT | 42.36 | 41.33 | 41.84 | 43.33 | +12.34 |
KNN | 43.50 | 43.46 | 42.47 | 43.33 | +12.34 |
Xgboost | 53.67 | 54.40 | 52.30 | 52.96 | +2.71 |
RF | 34.45 | 33.56 | 32.09 | 36.67 | +19.00 |
ET | 41.72 | 40.77 | 40.12 | 43.33 | +12.34 |
LightGBM | 42.47 | 40.21 | 41.31 | 40.00 | +15.67 |
SVM | 43.67 | 45.40 | 40.36 | 47.96 | +7.71 |
LSTM | 46.17 | 45.64 | 43.16 | 44.56 | +11.11 |
Stacking | 50.20 | 49.61 | 47.90 | 53.33 | +2.34 |
ATT_Stacking | 52.00 | 53.64 | 53.16 | 55.67 | 0.00 |
Table 6 Recognition results of different classification methods on FPPC
分类方法 | Precision | Recall | F1 | Acc | 改进增幅 |
---|---|---|---|---|---|
GBDT | 42.36 | 41.33 | 41.84 | 43.33 | +12.34 |
KNN | 43.50 | 43.46 | 42.47 | 43.33 | +12.34 |
Xgboost | 53.67 | 54.40 | 52.30 | 52.96 | +2.71 |
RF | 34.45 | 33.56 | 32.09 | 36.67 | +19.00 |
ET | 41.72 | 40.77 | 40.12 | 43.33 | +12.34 |
LightGBM | 42.47 | 40.21 | 41.31 | 40.00 | +15.67 |
SVM | 43.67 | 45.40 | 40.36 | 47.96 | +7.71 |
LSTM | 46.17 | 45.64 | 43.16 | 44.56 | +11.11 |
Stacking | 50.20 | 49.61 | 47.90 | 53.33 | +2.34 |
ATT_Stacking | 52.00 | 53.64 | 53.16 | 55.67 | 0.00 |
分类方法 | Precision | Recall | F1 | Acc | 改进增幅 |
---|---|---|---|---|---|
LSTM[19] | 38.00 | 51.12 | +18.88 | ||
BLSTM | 50.25 | 51.39 | 49.88 | 53.33 | +16.67 |
1D_CNN | 33.17 | 57.00 | 38.26 | 43.75 | +26.25 |
GBDT | 60.21 | 61.39 | 59.35 | 66.67 | +3.33 |
RF | 54.95 | 59.31 | 55.65 | 63.33 | +6.67 |
SVM | 40.00 | 44.44 | 40.36 | 48.07 | +21.93 |
Xgboost | 74.70 | 66.94 | 66.44 | 70.00 | 0.00 |
Table 7 Recognition results of different methods on IS09
分类方法 | Precision | Recall | F1 | Acc | 改进增幅 |
---|---|---|---|---|---|
LSTM[19] | 38.00 | 51.12 | +18.88 | ||
BLSTM | 50.25 | 51.39 | 49.88 | 53.33 | +16.67 |
1D_CNN | 33.17 | 57.00 | 38.26 | 43.75 | +26.25 |
GBDT | 60.21 | 61.39 | 59.35 | 66.67 | +3.33 |
RF | 54.95 | 59.31 | 55.65 | 63.33 | +6.67 |
SVM | 40.00 | 44.44 | 40.36 | 48.07 | +21.93 |
Xgboost | 74.70 | 66.94 | 66.44 | 70.00 | 0.00 |
实验方案 | 评价标准 | 愤怒 | 沮丧 | 高兴 | 中性 | 悲伤 | 惊喜 | 均值 | Acc | 改进增幅 |
---|---|---|---|---|---|---|---|---|---|---|
实验一 | Precision | 100.00 | 85.71 | 62.50 | 50.00 | 50.00 | 100.00 | 74.70 | 70.00 | |
Recall | 83.33 | 55.56 | 65.74 | 55.56 | 61.45 | 80.00 | 66.94 | +16.84 | ||
F1 | 80.00 | 71.43 | 71.43 | 51.39 | 54.50 | 68.65 | 66.44 | |||
实验二 | Precision | 33.33 | 66.67 | 57.14 | 43.33 | 50.00 | 61.54 | 52.00 | ||
Recall | 42.11 | 61.54 | 50.00 | 47.06 | 60.00 | 61.13 | 53.64 | 55.67 | +31.17 | |
F1 | 37.50 | 63.33 | 54.55 | 44.44 | 54.55 | 64.59 | 53.16 | |||
实验三 | Precision | 40.00 | 100.00 | 83.33 | 40.00 | 66.67 | 71.43 | 66.90 | ||
Recall | 33.33 | 87.50 | 100.0 | 56.00 | 77.36 | 83.33 | 72.92 | 76.67 | +10.17 | |
F1 | 40.00 | 93.33 | 100.0 | 50.00 | 88.89 | 66.68 | 73.15 | |||
实验四 | Precision | 57.00 | 56.00 | 75.00 | 44.00 | 86.00 | 56.00 | 62.33 | 61.92 | |
Recall | 60.00 | 49.00 | 64.00 | 65.00 | 64.00 | 57.00 | 59.83 | +24.92 | ||
F1 | 58.00 | 52.00 | 69.00 | 53.00 | 73.00 | 57.00 | 60.33 | |||
实验五 | Precision | 82.35 | 80.00 | 85.15 | 93.10 | 86.42 | 85.33 | 85.39 | 86.84 | |
Recall | 84.00 | 82.00 | 90.33 | 84.61 | 96.15 | 89.65 | 87.72 | 0.00 | ||
F1 | 81.16 | 79.35 | 88.71 | 90.19 | 92.03 | 88.60 | 86.67 |
Table 8 Experimental results of adaptive entropy weight decision fusion
实验方案 | 评价标准 | 愤怒 | 沮丧 | 高兴 | 中性 | 悲伤 | 惊喜 | 均值 | Acc | 改进增幅 |
---|---|---|---|---|---|---|---|---|---|---|
实验一 | Precision | 100.00 | 85.71 | 62.50 | 50.00 | 50.00 | 100.00 | 74.70 | 70.00 | |
Recall | 83.33 | 55.56 | 65.74 | 55.56 | 61.45 | 80.00 | 66.94 | +16.84 | ||
F1 | 80.00 | 71.43 | 71.43 | 51.39 | 54.50 | 68.65 | 66.44 | |||
实验二 | Precision | 33.33 | 66.67 | 57.14 | 43.33 | 50.00 | 61.54 | 52.00 | ||
Recall | 42.11 | 61.54 | 50.00 | 47.06 | 60.00 | 61.13 | 53.64 | 55.67 | +31.17 | |
F1 | 37.50 | 63.33 | 54.55 | 44.44 | 54.55 | 64.59 | 53.16 | |||
实验三 | Precision | 40.00 | 100.00 | 83.33 | 40.00 | 66.67 | 71.43 | 66.90 | ||
Recall | 33.33 | 87.50 | 100.0 | 56.00 | 77.36 | 83.33 | 72.92 | 76.67 | +10.17 | |
F1 | 40.00 | 93.33 | 100.0 | 50.00 | 88.89 | 66.68 | 73.15 | |||
实验四 | Precision | 57.00 | 56.00 | 75.00 | 44.00 | 86.00 | 56.00 | 62.33 | 61.92 | |
Recall | 60.00 | 49.00 | 64.00 | 65.00 | 64.00 | 57.00 | 59.83 | +24.92 | ||
F1 | 58.00 | 52.00 | 69.00 | 53.00 | 73.00 | 57.00 | 60.33 | |||
实验五 | Precision | 82.35 | 80.00 | 85.15 | 93.10 | 86.42 | 85.33 | 85.39 | 86.84 | |
Recall | 84.00 | 82.00 | 90.33 | 84.61 | 96.15 | 89.65 | 87.72 | 0.00 | ||
F1 | 81.16 | 79.35 | 88.71 | 90.19 | 92.03 | 88.60 | 86.67 |
1 | Akçay M B, Oğuz K.Speech emotion recognition:emotional models,databases,features,preprocessing methods,supporting modalities,and classifiers[J].Speech Communication,2020,116:56-76. |
2 | 孙颖,胡艳香,张雪英,等.面向情感语音识别的情感维度PAD预测[J].浙江大学学报(工学版),2019,53(10):2041-2048. |
Sun Ying, Hu Yan‐xiang, Zhang Xue‐ying,et al.Prediction of emotional dimensions PAD for emotional speech recognition[J].Journal of Zhejiang University (Engineering Science),2019,53(10):2041-2048. | |
3 | Moore J D, Tian L, Lai C.Word‐level emotion recognition using high-level features[J].Lecture Notes in Computer Science,2014,8404:17-31. |
4 | 赵小蕾,毛启容,詹永照.融合功能性副语言的语音情感识别新方法[J].计算机科学与探索,2014,8(2):186-199. |
Zhao Xiao‐lei, Mao Qi‐rong, Zhan Yong‐zhao.New method of speech emotion recognition fusing functional paralanguages[J].Journal of Frontiers of Computer Science & Technology,2014,8(2):186-199. | |
5 | Reuderink B, Poel M, Truong K,et al.Decision‐level fusion for audio‐visual laughter detection[C]//Popescu‐Belis A,Stiefelhagen R.International Workshop on Machine Learning for Multimodal Interaction.Berlin:Springer,2008:137-148. |
6 | Schuller B, Weninger F.Discrimination of speech and non‐linguistic vocalizations by non‐negative matrix factorization[C]//2010 IEEE International Conference on Acoustics,Speech and Signal Processing.Dallas,2010:5054-5057. |
7 | Foo L S, Yap W S, Hum Y C,et al.Real‐time baby crying detection in the noisy everyday environment[C]//11th IEEE Control and System Graduate Research Colloquium (ICSGRC).Shah Alam,2020:26-31. |
8 | Huang K Y, Wu C H, Hong Q B,et al.Speech emotion recognition using deep neural network considering verbal and nonverbal speech sounds[C]//2019 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).Brighton,2019:5866-5870. |
9 | Knox M T, Mirghafori N.Automatic laughter detection using neural networks[C]//8th Annual Conference of the International Speech Communication Association Belgium,2007: 2973-2976. |
10 | 赵小蕾,赵慧青.说话人功能性副语音自动检测算法[J].智能计算机与应用,2015,5(1):73-76. |
Zhao Xiao‐lei, Zhao Hui‐qing.Automatic detection algorithm of functional paralanguage in speech[J].Intelligent Computer and |
[1] | JIANG Yang, LIU Cheng, DING Qi-chuan, WANG Li. Segmentation of COVID-19 CT Images Based on Dual Attention Mechanism [J]. Journal of Northeastern University(Natural Science), 2023, 44(9): 1259-1268. |
[2] | ZHOU Song, GAO Tian-han. EEG Recognition Method for Epileptic Patients Based on RNN Model with Attention Mechanism [J]. Journal of Northeastern University(Natural Science), 2023, 44(8): 1098-1103. |
[3] | DING Qi-chuan, WANG Li, LIU Cheng. Classification of Pulmonary Nodule by Combining Long-Distance Channel Attention and Pathological Feature [J]. Journal of Northeastern University(Natural Science), 2023, 44(4): 476-485. |
[4] | SUN Ying, LI Ze, ZHANG Xue-ying. Speech Emotion Recognition Based on Constrained Bi-channel Model [J]. Journal of Northeastern University(Natural Science), 2023, 44(11): 1537-1542. |
[5] | CHEN Cheng, SHI Pei-xin, WANG Zhan-sheng, JIA Peng-jiao. Shield Load Prediction Method Based on Deep Learning with Multiattention Mechanism [J]. Journal of Northeastern University(Natural Science), 2023, 44(11): 1631-1638. |
[6] | LIN Qing-yang, CHEN Xiao-fang, XIE Yong-fang. An Superheat Identification Method in Aluminium Electrolysis Based on Residual Convolutional Self-Attention Neural Network [J]. Journal of Northeastern University(Natural Science), 2023, 44(1): 8-17. |
[7] | LIU Yang, YAN Dong-mei, MENG Fan-wei. Improved Two-Branch Person Re-identification Algorithm Based on Transformer [J]. Journal of Northeastern University(Natural Science), 2023, 44(1): 26-32. |
[8] | DAI Yin, LIU Wei-bin, DONG Xin-yang, SONG Yu-meng. U-Net CSF Cells Segmentation Based on Attention Mechanism [J]. Journal of Northeastern University(Natural Science), 2022, 43(7): 944-950. |
[9] | YU Zhe-zhou, LIU Yan, LIU Yuan-ning. Improved Iris Locating Algorithm Based on YOLOV3 [J]. Journal of Northeastern University(Natural Science), 2022, 43(4): 496-501. |
[10] | LI Da-peng, ZHAO Qi-hui, XING Tie-jun, ZHAO Da-zhe. Prison Term Prediction of Judicial Cases Based on Hierarchical Attentive Recurrent Neural Network [J]. Journal of Northeastern University(Natural Science), 2022, 43(3): 344-349. |
[11] | ZHENG Yan, CHEN Jia-nan, WU Fan, FU Bin. Research and Implementation of Speech Emotion Recognition Based on CGRU Model [J]. Journal of Northeastern University Natural Science, 2020, 41(12): 1680-1685. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||