Journal of Northeastern University(Natural Science) ›› 2024, Vol. 45 ›› Issue (6): 793-801.DOI: 10.12068/j.issn.1005-3026.2024.06.005

• Information & Control • Previous Articles    

VMD Based Binary Channels Speech Feature Map Extraction Algorithm for Dysarthria

Pei-yun XUE1,2(), Jing BAI1, Nan ZHANG3, Jian-xing ZHAO1   

  1. 1.College of Electronic Information Engineering,Taiyuan University of Technology,Jinzhong 030600,China
    2.Post-doctoral Research Station,Shanxi Academy of Advanced Research and Innovation,Taiyuan 030024,China
    3.School of Information and Communication Engineering,North University of China,Taiyuan 030024,China.
  • Received:2023-05-23 Online:2024-06-15 Published:2024-09-18
  • Contact: Pei-yun XUE
  • About author:XUE Pei-yun, E-mail: xuepeiyun@tyut.edu.cn

Abstract:

A multiscale binary channels filter banks (MBCFbank) feature extraction algorithm based on variational modal decomposition (VMD) is proposed to address the issue of poor speech recognition caused by insufficient extraction of effective feature information from speech of patients with dysarthria. Firstly, in order to better extract the acoustic features that conform to the structural characteristics of human ears, a binary?channels filter banks (BCFbank) feature extraction algorithm is proposed, which uses Mel filtering and performs logarithmic transformation, simultaneously using Gammatone filtering to perform nonlinear loudness transformation. Secondly, VMD is used to optimize the BCFbank features. Three components with higher correlation coefficients are selected from the decomposed multiple speech signal components, and their BCFbank features and differential features are extracted respectively. At the same time, BCFbank features are extracted from the undecomposed speech signals to form the MBCFbank feature map spectrum. Finally, training and recognition are conducted on a dual channel speech recognition model. The experimental results show that the speech recognition model based on BCFbank features and MBCFbank feature maps has the highest accuracy of 87.82% and 94.34%, respectively, which is superior to the recognition effect of Fbank features.

Key words: speech recognition with dysarthria, variational mode decomposition, convolutional neural network, MBCFbank features

CLC Number: