Research on Emotion Recognition Method of Music Multimodal Data

doi:10.12068/j.issn.1005-3026.2024.06.003

Abstract

Abstract:

The research of music emotion recognition has broad application prospects in the fields of music intelligent recommendation and music visualization. Aiming at the problem that only using low?level audio features for emotion recognition has limited effectiveness and poor interpretability. Firstly， an emotion recognition model ERMSLM based on MIDI （musical instrument digital interface） data is constructed， which can learn the semantic information of notes. The features of this model are composed of melodic features extracted with skip?gram and LSTM（long short?term memory）， tonal features extracted by pre?trained MLP and manually constructed features. Secondly， an emotion recognition model ERMBT based on text data that integrates lyrics and social tags is constructed. The lyrics features are composed of emotional features extracted with BERT， emotional dictionary features constructed by using ANEW lists and TF-IDF features of lyrics. Finally， two multimodal fusion models of feature?level fusion and decision?level fusion are constructed based on MIDI and text data. The experimental results show that the ERMSLM and ERMBT models can achieve accuracies of 56.93% and 72.62% respectively. And the decision?level multimodal fusion model is more effective.

Key words: music emotion recognition, deep learning, multimodal, LSTM

CLC Number:

TP 391.1

Dong-hong HAN, Yan-ru KONG, Yi-meng ZHAN, Yuan LIU. Research on Emotion Recognition Method of Music Multimodal Data[J]. Journal of Northeastern University(Natural Science), 2024, 45(6): 776-785.

Figures/Tables 13

References 27

1	Han D H， Kong Y R， Han J Y，et al.A survey of music emotion recognition［J］.Frontiers of Computer Science，2022，16（6）：166335.
2	Jazi S Y， Kaedi M， Fatemi A.An emotion‑aware music recommender system：bridging the user’s interaction and music recommendation［J］.Multimedia Tools and Application，2021，80（9）：13559-13574.
3	Dharmapriya J， Dayarathne L， Diasena T，et al.Music emotion visualization through colour［C］//2021 International Conference on Electronics，Information，and Communication （ICEIC）.Jeju，2021：1-6.
4	Novelli N， Proksch S.Am I （deep） blue？music‑making AI and emotional awareness［J］.Frontiers in Neurorobotics，2022，16：897110.
5	Shukuroglou M， Roseman L， Wall M，et al.Changes in music‑evoked emotion and ventral striatal functional connectivity after psilocybin therapy for depression［J］.Journal of Psychopharmacology，2023，37（1）：70-79.
6	陈晓鸥，杨德顺.音乐情感识别研究进展［J］.复旦学报（自然科学版），2017，56（2）：136-148.
	Chen Xiao‑ou， Yang De‑shun.Research progresses in music emotion recognition［J］.Journal of Fudan University （Natural Science），2017，56（2）：136-148.
7	Panda R， Malheiro R， Paiva R P.Novel audio features for music emotion recognition［J］.IEEE Transactions on Affective Computing，2020，11（4）：614-626.
8	Singh Y， Biswas A.Robustness of musical features on deep learning models for music genre classification［J］.Expert Systems with Applications，2022，199：116879.
9	邓永莉，吕愿愿，刘明亮，等.基于中高层特征的音乐情感识别模型［J］.计算机工程与设计，2017，38（4）：1029-1034.
	Deng Yong‑li， Yuan‑yuan Lyu， Liu Ming‑liang，et al.Music emotion recognition based on middle and high level features［J］.Computer Engineering and Design，2017，38（4）：1029-1034.
10	Qiu L， Zhong Y， Xie Q，et al.Multi‑modal integration of EEG-fNIRS for characterization of brain activity evoked by preferred music［J］.Frontiers in Neurorobotics，2022，16：823435.
11	Delbouys R， Hennequin R， Piccoli F，et al.Music mood detection based on audio and lyrics with deep neural net［C］//Proceedings of the 19th International Society for Music Information Retrieval Conference（ISMIR）.Paris，2018：370-375.
12	Jia X S.A music emotion classification model based on the improved convolutional neural network［J］.Computational Intelligence and Neuroscience，2022，2022：6749622.
13	Liu X， Chen Q， Wu X，et al.CNN based music emotion classification［J］.arXiv prePrint arXiv，2017：1704.05665.
14	Keelawat P， Thammasan N， Kijsirikul B，et al.Subject‐independent emotion recognition during music listening based on EEG using deep convolutional neural networks［C］//2019 IEEE 15th International Colloquium on Signal Processing & Its Applications （CSPA）.Penang，2019：21-26.
15	Chowdhury S， Vall A， Haunscmid V，et al.Towards explainable music emotion recognition：the route via mid‑level features［C］//Proceedings of the 20th International Society for Music Information Retrieval Conference（ISMIR）.Delft，2019：237-243.
16	Ma Y， Li X X， Xu M X，et al.Multi‑scale context based attention for dynamic music emotion prediction［C］//Proceedings of the 25th ACM international conference on Multimedia.Mountain View，2017：1443-1450.
17	Liu H， Fang Y， Huang Q.Music emotion recognition using a variant of recurrent neural network［C］//Proceedings of the International Conference on Mathematics，Modeling，Simulation and Statistics Application（MMSSA）.Chengdu，2018：15-18.
18	Chang W H， Li J L， Lin Y S，et al.A genre‑affect relationship network with task‑specific uncertainty weighting for recognizing induced emotion in music［C］//2018 IEEE International Conference on Multimedia and Expo （ICME）.San Diego，2018：1-6.
19	Soleymani M， Aljanaki A， Yang Y，et al.Emotional analysis of music：a comparison of methods［C］//Proceedings of the ACM Conference on Multimedia（MM）.Orlando：ACM，2014：1161-1164.
20	Li X X， Tian J S， Xu M X，et al.DBLSTM‑based multi‐scale fusion for dynamic emotion prediction in music［C］//2016 IEEE International Conference on Multimedia and Expo （ICME）.Seattle，2016：1-6.
21	Chaki S， Doshi P， Patnaik P，et al.Attentive RNNs for continuous‑time emotion prediction in music clips［C］//Proceedings of the 3rd Workshop in Affective Content Analysis co‑located with Thirty‑Fourth AAAI Conference on Artificial Intelligence.New York：AAAI，2020：36-46.
22	韩文静，李海峰，阮华斌，等.语音情感识别研究进展综述［J］.软件学报，2014，25（1）：37-50.
	Han Wen‑jing， Li Hai‑feng， Ruan Hua‑bin，et al.Review on speech emotion recognition［J］.Journal of Software，2014，25（1）：37-50.
23	Zaanen M V， Kanters P.Automatic mood classification using TF*IDF based on lyrics［C］//Proceedings of the 11th International Society for Music Information Retrieval Conference（ISMIR）.Utrecht，2010：75-80.
24	Wang X， Chen X， Yang D，et al.Music emotion classification of Chinese songs based on lyrics using TF*IDF and rhyme［C］//Proceedings of the 12th International Society for Music Information Retrieval Conference（ISMIR）.Miami，2011：765-770.
25	Xie Z W， Liu L， Wu Y Z，et al.Learning TFIDF enhanced joint embedding for recipe‑image cross‑modal retrieval service［J］.IEEE Transactions on Services Computing，2022，15（6）：3304-3316.
26	Chen P L， Zhao L， Xin Z Y，et al.A scheme of MIDI music emotion classification based on fuzzy theme extraction and neural network［C］//2016 12th International Conference on Computational Intelligence and Security （CIS）.Wuxi，2016：323-326.
27	Huang M Y， Rong W G， Arjannikov T，et al.Bi‑modal deep boltzmann machine based musical emotion classification［C］//International Conference on Artificial Neural Networks.Cham：Springer，2016：199-207.

词汇	Happy	Anxious	Sad	Relaxed
Happy	—	0.368	0.326	0.329
Anxious	0.368	—	0.416	0.276
Sad	0.326	0.416	—	0.341
Relaxed	0.329	0.276	0.341	—

词汇	Happy	Anxious	Sad	Relaxed
Happy	—	0.368	0.326	0.329
Anxious	0.368	—	0.416	0.276
Sad	0.326	0.416	—	0.341
Relaxed	0.329	0.276	0.341	—

V⁺A⁺	V ^-A⁺	V ^-A^-	V⁺A^-
happy	heartbreak	sad	chillout
upbeat	angry	soft	soul
fun	epic	acoustic	smooth
party	heartache	emotional	relax
catchy	aggressive	dark	relaxing

V⁺A⁺	V ^-A⁺	V ^-A^-	V⁺A^-
happy	heartbreak	sad	chillout
upbeat	angry	soft	soul
fun	epic	acoustic	smooth
party	heartache	emotional	relax
catchy	aggressive	dark	relaxing

类别	模型	Accuracy	Marco-F1
对比实验	手工特征+神经网络	0.383 2	0.388 4
	MFCC+SVM（RBF）	0.551 1	0.538 6
	MFCC+SVM（sigmoid）	0.554 7	0.519 4
	MFCC+DBM	0.551 1	0.594 4
消融实验	ERMSLM（m_i ）	0.463 5	0.489 5
消融实验	ERMSLM（m_i + k_i ）	0.547 4	0.597 2
	ERMSLM	0.569 3	0.599 9