东北大学学报(自然科学版) ›› 2024, Vol. 45 ›› Issue (6): 793-801.DOI: 10.12068/j.issn.1005-3026.2024.06.005

• 信息与控制 • 上一篇    

基于VMD的双通道构音障碍语音特征图谱提取算法

薛珮芸1,2(), 白静1, 张楠3, 赵建星1   

  1. 1.太原理工大学 电子信息工程学院,山西 晋中 030600
    2.山西高等创新研究院 博士后科研工作站,山西 太原 030024
    3.中北大学 信息与通信工程学院,山西 太原 030024
  • 收稿日期:2023-05-23 出版日期:2024-06-15 发布日期:2024-09-18
  • 通讯作者: 薛珮芸
  • 作者简介:薛珮芸(1990-),女,山西太原人,太原理工大学讲师,博士
    白 静(1965-),女,山西太原人,太原理工大学教授.
  • 基金资助:
    山西省应用基础研究计划项目(201901D111094);山西省基础研究项目(青年)(20210302124544)

VMD Based Binary Channels Speech Feature Map Extraction Algorithm for Dysarthria

Pei-yun XUE1,2(), Jing BAI1, Nan ZHANG3, Jian-xing ZHAO1   

  1. 1.College of Electronic Information Engineering,Taiyuan University of Technology,Jinzhong 030600,China
    2.Post-doctoral Research Station,Shanxi Academy of Advanced Research and Innovation,Taiyuan 030024,China
    3.School of Information and Communication Engineering,North University of China,Taiyuan 030024,China.
  • Received:2023-05-23 Online:2024-06-15 Published:2024-09-18
  • Contact: Pei-yun XUE
  • About author:XUE Pei-yun, E-mail: xuepeiyun@tyut.edu.cn

摘要:

针对在提取构音障碍患者语音有效特征信息不足,导致语音识别率低的问题,提出一种基于变分模态分解(VMD)的多尺度双通道滤波器组(MBCFbank)特征图谱提取算法.首先,为了更好地提取符合人耳听觉结构特性的声学特征,提出一种双通道滤波器组(BCFbank)特征提取算法,该算法采用Mel滤波后做对数变换,同时采用Gammatone滤波后作非线性响度变换;其次,采用VMD来优化BCFbank特征,对分解后的多个语音信号分量筛选出相关系数较高的3个,分别提取其BCFbank特征及其差分特征,同时对未分解的语音信号提取BCFbank特征,从而构成MBCFbank特征图谱;最后,在双路语音识别模型上进行训练和识别.实验结果表明,基于BCFbank特征、MBCFbank特征图谱的语音识别模型准确率最高分别达到了87.82%,94.34%,优于Fbank特征的识别效果.

关键词: 构音障碍语音识别, 变分模态分解, 卷积神经网络, MBCFbank特征

Abstract:

A multiscale binary channels filter banks (MBCFbank) feature extraction algorithm based on variational modal decomposition (VMD) is proposed to address the issue of poor speech recognition caused by insufficient extraction of effective feature information from speech of patients with dysarthria. Firstly, in order to better extract the acoustic features that conform to the structural characteristics of human ears, a binary?channels filter banks (BCFbank) feature extraction algorithm is proposed, which uses Mel filtering and performs logarithmic transformation, simultaneously using Gammatone filtering to perform nonlinear loudness transformation. Secondly, VMD is used to optimize the BCFbank features. Three components with higher correlation coefficients are selected from the decomposed multiple speech signal components, and their BCFbank features and differential features are extracted respectively. At the same time, BCFbank features are extracted from the undecomposed speech signals to form the MBCFbank feature map spectrum. Finally, training and recognition are conducted on a dual channel speech recognition model. The experimental results show that the speech recognition model based on BCFbank features and MBCFbank feature maps has the highest accuracy of 87.82% and 94.34%, respectively, which is superior to the recognition effect of Fbank features.

Key words: speech recognition with dysarthria, variational mode decomposition, convolutional neural network, MBCFbank features

中图分类号: