Speech Emotion Recognition Fusing Functional Paralanguage Proportion Coefficient

doi:10.12068/j.issn.1005-3026.2024.01.006

Abstract

Abstract:

Nonverbal vocalizations such as laughter， sighs， and sobs in speech are called functional paralanguage and play an important role in emotional expression. However， existing research has rarely considered the synergistic effect of multiple functional paralanguages in a single emotion. To address this issue， an emotion recognition system integrating functional paralanguage proportion coefficients （FPPC） is proposed. Firstly， FPPC features that reflect the frequency and duration of multiple functional paralanguages appearing in emotional statements are extracted. Then， an attention mechanism-based ensemble learning is constructed to assign different weights to different base classifiers and train the FPPC features. Finally， the adaptive entropy weight decision fusion method is used to fuse traditional speech emotion recognition with emotion recognition based on FPPC features. Experimental results show a 16.84% improvement in emotion recognition after integrating FPPC features， proving that integrating FPPC features can effectively improve the overall recognition rate of the system.

Key words: speech emotion recognition, proportion coefficient, functional paralanguage, attention mechanism, adaptive entropy weight decision fusion

CLC Number:

TN 912.3

Ying SUN, Ya-ru ZHOU, Xue-ying ZHANG. Speech Emotion Recognition Fusing Functional Paralanguage Proportion Coefficient[J]. Journal of Northeastern University(Natural Science), 2024, 45(1): 40-48.

Figures/Tables 12

References 10

1	Akçay M B， Oğuz K.Speech emotion recognition：emotional models，databases，features，preprocessing methods，supporting modalities，and classifiers［J］.Speech Communication，2020，116：56-76.
2	孙颖，胡艳香，张雪英，等.面向情感语音识别的情感维度PAD预测［J］.浙江大学学报（工学版），2019，53（10）：2041-2048.
	Sun Ying， Hu Yan‐xiang， Zhang Xue‐ying，et al.Prediction of emotional dimensions PAD for emotional speech recognition［J］.Journal of Zhejiang University （Engineering Science），2019，53（10）：2041-2048.
3	Moore J D， Tian L， Lai C.Word‐level emotion recognition using high-level features［J］.Lecture Notes in Computer Science，2014，8404：17-31.
4	赵小蕾，毛启容，詹永照.融合功能性副语言的语音情感识别新方法［J］.计算机科学与探索，2014，8（2）：186-199.
	Zhao Xiao‐lei， Mao Qi‐rong， Zhan Yong‐zhao.New method of speech emotion recognition fusing functional paralanguages［J］.Journal of Frontiers of Computer Science & Technology，2014，8（2）：186-199.
5	Reuderink B， Poel M， Truong K，et al.Decision‐level fusion for audio‐visual laughter detection［C］//Popescu‐Belis A，Stiefelhagen R.International Workshop on Machine Learning for Multimodal Interaction.Berlin：Springer，2008：137-148.
6	Schuller B， Weninger F.Discrimination of speech and non‐linguistic vocalizations by non‐negative matrix factorization［C］//2010 IEEE International Conference on Acoustics，Speech and Signal Processing.Dallas，2010：5054-5057.
7	Foo L S， Yap W S， Hum Y C，et al.Real‐time baby crying detection in the noisy everyday environment［C］//11th IEEE Control and System Graduate Research Colloquium （ICSGRC）.Shah Alam，2020：26-31.
8	Huang K Y， Wu C H， Hong Q B，et al.Speech emotion recognition using deep neural network considering verbal and nonverbal speech sounds［C］//2019 IEEE International Conference on Acoustics，Speech and Signal Processing （ICASSP）.Brighton，2019：5866-5870.
9	Knox M T， Mirghafori N.Automatic laughter detection using neural networks［C］//8th Annual Conference of the International Speech Communication Association Belgium，2007： 2973-2976.
10	赵小蕾，赵慧青.说话人功能性副语音自动检测算法［J］.智能计算机与应用，2015，5（1）：73-76.
	Zhao Xiao‐lei， Zhao Hui‐qing.Automatic detection algorithm of functional paralanguage in speech［J］.Intelligent Computer and

比例特征	统计特征
持续时间	最大值/最小值
	最大值位置/最小值位置
	第一/二/三/分位数
	均值
	平均绝对偏差
频次	标准偏差
频次	偏度、峰度、方差

比例特征	统计特征
持续时间	最大值/最小值
	最大值位置/最小值位置
	第一/二/三/分位数
	均值
	平均绝对偏差
频次	标准偏差
频次	偏度、峰度、方差

模型选择		优点
基模型	KNN	时间复杂度低
	RF	抗过拟合能力强
	GBDT	适合低维数据
	Adaboost	精度高
	Extra Trees	泛化能力好
	LightGBM	训练速度快
元模型	SVM	非线性映射、泛化好

模型选择		优点
基模型	KNN	时间复杂度低
	RF	抗过拟合能力强
	GBDT	适合低维数据
	Adaboost	精度高
	Extra Trees	泛化能力好
	LightGBM	训练速度快
元模型	SVM	非线性映射、泛化好

情感	功能性副语言
情感	笑声	叫喊声	叹息声	抽泣声
愤怒	13	4	29	2
沮丧	18	25	23	9
高兴	108	19	6	0
中性	39	43	27	1
悲伤	10	5	44	102
惊喜	59	18	34	17