基于特征融合的说话人聚类算法

doi:10.12068/j.issn.1005-3026.2021.07.007

东北大学学报（自然科学版） ›› 2021, Vol. 42 ›› Issue (7): 952-959.DOI: 10.12068/j.issn.1005-3026.2021.07.007

基于特征融合的说话人聚类算法

郑艳，姜源祥

(东北大学信息科学与工程学院，辽宁沈阳110819)

修回日期:2020-01-06 接受日期:2020-01-06 发布日期:2021-07-16
通讯作者: 郑艳
作者简介:郑艳(1963-)，女，辽宁沈阳人，东北大学副教授.
基金资助:
国家自然科学基金资助项目(61773108).

Speaker Clustering Algorithm Based on Feature Fusion

ZHENG Yan， JIANG Yuan-xiang

School of Information Science & Engineering， Northeastern University， Shenyang 110819， China.

Revised:2020-01-06 Accepted:2020-01-06 Published:2021-07-16
Contact: JIANG Yuan-xiang
About author:-
Supported by:
-

摘要/Abstract

摘要： 针对单一声学特征和k-means算法在说话人聚类技术中的局限性，为了更好地表达说话人的个性信息并提高说话人聚类的准确率，将特征融合和AE-SOM神经网络应用于说话人聚类中，提出一种改进的说话人聚类算法.该算法通过对语音信号特征分析，将MFCC特征参数和LPCC特征参数相结合，从而完善说话人的个性信息.并在k-means的基础上增加AE-SOM神经网络，利用该网络实现输入特征的降维、说话人数的判定和聚类中心的选取，从而弥补k-means算法的缺陷.仿真实验表明两种声学特征融合之后，改进的聚类算法可有效地提高说话人聚类的准确率.

关键词: 声学特征;k-means;说话人聚类;特征融合;AE-SOM;神经网络

Abstract: Aiming at the limitation of single acoustic feature and k-means algorithm in speaker clustering technology， in order to better express the speaker’s personality information and improve the accuracy of speaker clustering， feature fusion and AE-SOM neural network are applied to speaker clustering， and an improved speaker clustering algorithm is proposed. The algorithm combines MFCC feature parameters with LPCC feature parameters to improve the speaker’s personality information. The AE-SOM neural network is added on the basis of k-means to reduce the dimension of input features， determine the number of speakers and select the cluster centers， so as to make up for the defects of k-means algorithm. Simulation results show that the improved clustering algorithm can effectively improve the accuracy of speaker clustering.

Key words: acoustic feature; k-means; speaker clustering; feature fusion; AE-SOM; neural network

中图分类号:

TN912.3

郑艳，姜源祥. 基于特征融合的说话人聚类算法[J]. 东北大学学报（自然科学版）, 2021, 42(7): 952-959.

ZHENG Yan， JIANG Yuan-xiang. Speaker Clustering Algorithm Based on Feature Fusion[J]. Journal of Northeastern University(Natural Science), 2021, 42(7): 952-959.

参考文献

[1]史小元，景新幸，曾敏，等.基于改进PNCC和i-vector的说话人识别鲁棒性［J］.计算机工程与设计，2017(4):1071-1075.(Shi Xiao-yuan，Jing Xin-xing，Zeng Min，et al.Robustness of speaker recognition based on improved PNCC and i-vector［J］.Computer Engineering and Design，2017(4):1071-1075.)
[2]郑艳，高爽.基于自适应门限的分形维数语音端点检测［J］.东北大学学报(自然科学版)，2020，41(1):7-11.(Zheng Yan，Gao Shuang.Speech endpoint detection based on fractal dimension with adaptive threshold［J］.Journal of Northeastern University (Natural Science)，2020，41(1):7-11.)
[3]Huang Z，Siniscalchi S M，Lee C H.A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition［J］.Neurocomputing，2016，218:448-459.
[4]Kanagasundaram A.Speaker verification using i-vector features［D］.Brisbane:Queensland University of Technology，2014.
[5]Milton A，Roy S S，Selvi S T.SVM scheme for speech emotion recognition using MFCC feature［J］.International Journal of Computer Applications，2014，69(9):34-39.
[6]Schuller B，Burkhardt F.Learning with synthesized speech for automatic emotion recognition［C］// Proceedings of the IEEE International Conference on Acoustics Speech & Signal Processing.Dallas，2010:18-20.
[7]Atasever U H.A novel unsupervised change detection approach based on reconstruction independent component analysis and ABC-Kmeans clustering for environmental monitoring［J］.Environmental Monitoring and Assessment，2019，191:447.
[8]Arthur D，Vassilvitskii S.K-means⁺⁺:the advantages of carefull seeding［C］// Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms.New Orleans，2007:1027-1035.
[9]Jain A K.Data clustering:50 years beyond K-means［J］.Pattern Recognition Letters，2010，31(8):651-666.
[10]邵明强，徐志京.基于改进 MFCC 特征的语音识别算法［J］.微型机与应用，2017(21):52-54.(Shao Ming-qiang，Xu Zhi-jing.A speech recognition algorithm based on improved MFCC［J］.Microcomputer & Its Applications，2017(21):52-54.)
[11]Vergin R，O′Shaughnessy D，Gupta V.Compensated Mel frequency cepstrum coefficients［C］//IEEE International Conference on Acoustics.Atlanta，1996:323-326.
[12]Likitha M S，Gupta S R R，Hasitha K，et al.Speech based human emotion recognition using MFCC［C］// 2017 International Conference on Wireless Communications，Signal Processing and Networking(WiSPNET).Chennai，2017:222-224.
[13]Dahake P P，Shaw K，Malathi P.Speaker dependent speech emotion recognition using MFCC and support vector machine［C］// 2016 International Conference on Automatic Control and Dynamic Optimization Techniques(ICACDOT).Pune，2016:67-68.
[14]Mansour A，Lachiri Z.A comparative study in emotional speaker recognition in noisy environment［C］// IEEE/ACS International Conference on Computer Systems & Applications.Hammamet，2017:980-986.
[15]史水平，李世作.线性预测编码(LPC)技术及其在音频文件上的应用［J］.现代电子技术，2004，27(4):21-23.(Shi Shui-ping，Li Shi-zuo.LPC technique and its application in audio file［J］.Modern Electronic Technique，2004，27(4):21-23.)
[16]Hinton G，Salakhutdinov R.Reducing the dimensionality of data with neural networks［J］.Science，2006，313(5786):504-507.
[17]Zhang C，Li X，Li W，et al.A novel i-vector framework using multiple features and PCA for speaker recognition in short speech condition［C］// 2016 International Conference on Audio，Language and Image Processing(ICALIP).Shanghai，2016:499-503.
[18]Kohonen T.Self-organized formation of topologically correct feature maps［J］.Biological Cybernetics，1982，43(1):59-69.
[19]Yang P，Wang D，Wei Z，et al.An outlier detection approach based on improved self-organizing feature map clustering algorithm［J］.IEEE Access，2019，7:115914-115925.
[20]Araujo A F R，Antonino V O，Guevara K L P.Self-organizing subspace clustering for high-dimensional and multi-view data［J］.Neural Networks，2020，130:253-268.(上接第951页)体间存在较大的温差，且由中间辐射体带来的流动扰动使钢坯表面流速增加也对换热效果存在一定积极影响.3) 顶部中间辐射体为近似黑体，在充分预热后为炉内环境提供了稳定的辐射热源，相对钢坯表面的比表面积增加了6%，加热速度提升了16.7%.

基于特征融合的说话人聚类算法

Speaker Clustering Algorithm Based on Feature Fusion

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 2

编辑推荐

Metrics

本文评价

[1]	郑艳，陈家楠，吴凡，付彬. 基于CGRU模型的语音情感识别研究与实现[J]. 东北大学学报:自然科学版, 2020, 41(12): 1680-1685.
[2]	郑艳，高爽. 基于自适应门限的分形维数语音端点检测[J]. 东北大学学报:自然科学版, 2020, 41(1): 7-11.