VMD Based Binary Channels Speech Feature Map Extraction Algorithm for Dysarthria

doi:10.12068/j.issn.1005-3026.2024.06.005

Abstract

Abstract:

A multiscale binary channels filter banks （MBCFbank） feature extraction algorithm based on variational modal decomposition （VMD） is proposed to address the issue of poor speech recognition caused by insufficient extraction of effective feature information from speech of patients with dysarthria. Firstly， in order to better extract the acoustic features that conform to the structural characteristics of human ears， a binary?channels filter banks （BCFbank） feature extraction algorithm is proposed， which uses Mel filtering and performs logarithmic transformation， simultaneously using Gammatone filtering to perform nonlinear loudness transformation. Secondly， VMD is used to optimize the BCFbank features. Three components with higher correlation coefficients are selected from the decomposed multiple speech signal components， and their BCFbank features and differential features are extracted respectively. At the same time， BCFbank features are extracted from the undecomposed speech signals to form the MBCFbank feature map spectrum. Finally， training and recognition are conducted on a dual channel speech recognition model. The experimental results show that the speech recognition model based on BCFbank features and MBCFbank feature maps has the highest accuracy of 87.82% and 94.34%， respectively， which is superior to the recognition effect of Fbank features.

Key words: speech recognition with dysarthria, variational mode decomposition, convolutional neural network, MBCFbank features

CLC Number:

TP 912.34

Pei-yun XUE, Jing BAI, Nan ZHANG, Jian-xing ZHAO. VMD Based Binary Channels Speech Feature Map Extraction Algorithm for Dysarthria[J]. Journal of Northeastern University(Natural Science), 2024, 45(6): 793-801.

Figures/Tables 12

References 17

1	Mohammed S Y， Sid‑Ahmed S， Brahim‑Fares Z，et al.Improving dysarthric speech recognition using empirical mode decomposition and convolutional neural network［J］.EURASIP Journal on Audio，Speech，and Music Processing，2020，2020（1）：1-7.
2	Al‑Qatab B A， Mustafa M B.Classification of dysarthric speech according to the severity of impairment：an analysis of acoustic features［J］.IEEE Access，2021（9）：18183-18194.
3	Liu S， Hu S， Xie X，et al.Recent progress in the CUHK dysarthric speech recognition system［J］.IEEE/ACM Transactions on Audio，Speech，and Language Processing，2021，29（99）：2267-2281.
4	Yue Z， Loweimi E， Christensen H，et al.Acoustic modelling from raw source and filter components for dysarthric speech recognition［J］.IEEE/ACM Transactions on Audio，Speech，and Language Processing，2022（30）：2968-2980.
5	梁正友，黎雨星，孙宇，等.基于多特征组合的构音障碍语音识别［J］.计算机工程与设计，2022，43（2）：567-572.
	Liang Zheng‑you， Li Yu‑xing， Sun Yu，et al.Speech recognition with dysarthria based on multi‑feature combination［J］.Computer Engineering and Design，2022，43（2）：567-572.
6	Jiao Y， Tu M， Berisha V，et al.Simulating dysarthric speech for training data augmentation in clinical speech applications［C］//2018 IEEE International Conference on Acoustics，Speech and Signal Processing （ICASSP）.Calgary Allcerta：IEEE，2018：6009-6013.
7	Yilmaz E， Mitra V， Sivaraman G，et al.Articulatory and bottleneck features for speaker‑independent ASR of dysarthric speech［J］.Computer Speech & Language，2019，58：319-334.
8	Zaidi B F， Selouani S A， Boudraa M，et al.Deep neural network architectures for dysarthric speech analysis and recognition［J］.Neural Computing and Applications，2021，33（15）：9089-9108.
9	Mariya T A， Vijayalakshmi P， Nagarajan T.Data augmentation techniques for transfer learning‑based continuous dysarthric speech recognition［J］.Circuits，Systems，and Signal Processing，2023，42（1）：601-622.
10	李东，张雪英，段淑斐，等.结合语音融合特征和随机森林的构音障碍识别［J］.西安电子科技大学学报，2018，45（3）：149-155.
	Li Dong， Zhang Xue‑ying， Duan Shu‑fei，et al.Articulation disorder recognition based on speech fusion features and random forest［J］.Journal of Xidian University，2018，45（3）：149-155.
11	吴丽丹.基于深度时序网络的多视图构音障碍语音识别［D］.上海：华东师范大学，2021.
	Wu Li‑dan.Multi‑view articulation disorder speech recognition based on deep temporal network［D］.Shanghai：East China Normal University，2021.
12	王赵国，韦存海，彭雅妮，等.基于GFCC-SVM-RFE的电力设备声音特征提取方法［J］.电力信息与通信技术，2022，20（9）：34-42.
	Wang Zhao‑guo， Wei Cun‑hai， Peng Ya‑ni，et al.Sound feature extraction method of Power Equipment based on GFCC‑SVM‑RFE［J］.Electric Power Information and Communication Technology，2022，20（9）：34-42.
13	Dragomiretskiy K， Zosso D.Variational mode decomposition［J］.IEEE Transactions on Signal Processing，2014，62（3）：531-544.
14	Fritsch J， Magimai‑Doss M.Utterance verification‑based dysarthric speech intelligibility assessment using phonetic posterior features［J］.IEEE Signal Processing Letters，2021（28）：224-228.
15	Shahamiri S R， Salim S.Artificial neural networks as speech recognisers for dysarthric speech：identifying the best‑performing set of MFCC parameters and studying a speaker‑independent approach［J］.Advanced Engineering Informatics，2014，28（1）：102-110.
16	Rajeswari N， Chandrakala S.Generative model‑driven feature learning for dysarthric speech recognition［J］.Biocybernetics & Biomedical Engineering，2016，36（4）：553-561.
17	Shahamiri S R.Speech vision：an end‑to‑end deep learning‑based dysarthric automatic speech recognition system［J］.IEEE Transactions on Neural Systems and Rehabilitation Engineering，2021（29）：852-861.

特征	WRA/%
特征	模型1	模型2	模型3
特征1	83.74	84.45	85.47
特征2	85.94	86.49	87.82
特征3	91.52	92.14	93.48
特征4	92.69	93.24	94.34
特征5	69.91	70.78	71.88

特征	WRA/%
特征	模型1	模型2	模型3
特征1	83.74	84.45	85.47
特征2	85.94	86.49	87.82
特征3	91.52	92.14	93.48
特征4	92.69	93.24	94.34
特征5	69.91	70.78	71.88

构音障碍患者	语音清晰度水平/%	WRA/%
构音障碍患者	语音清晰度水平/%	特征1	特征2	特征3	特征4
M04	2	71.76	77.65	85.88	87.06
F03	6	79.31	84.48	89.66	91.38
M12	7	74.12	76.47	88.24	89.41
M01	15	70.59	69.12	83.82	85.29
M07	28	88.37	90.70	95.35	93.02
F02	29	84.75	89.83	94.92	94.92
M06	39	86.59	86.59	93.90	95.12
M16	43	87.80	82.93	91.46	93.90
M05	58	90.24	92.68	92.68	95.12
F04	62	85.71	87.30	93.65	95.24
M11	62	86.42	88.89	91.36	92.59
M09	86	84.30	91.74	95.04	95.87
M14	90	92.77	93.98	95.18	96.39
M08	93	89.34	90.98	94.26	95.90
M10	93	93.10	93.97	96.55	97.41

构音障碍患者	语音清晰度水平/%	WRA/%
构音障碍患者	语音清晰度水平/%	特征1	特征2	特征3	特征4
M04	2	71.76	77.65	85.88	87.06
F03	6	79.31	84.48	89.66	91.38
M12	7	74.12	76.47	88.24	89.41
M01	15	70.59	69.12	83.82	85.29
M07	28	88.37	90.70	95.35	93.02
F02	29	84.75	89.83	94.92	94.92
M06	39	86.59	86.59	93.90	95.12
M16	43	87.80	82.93	91.46	93.90
M05	58	90.24	92.68	92.68	95.12
F04	62	85.71	87.30	93.65	95.24
M11	62	86.42	88.89	91.36	92.59
M09	86	84.30	91.74	95.04	95.87
M14	90	92.77	93.98	95.18	96.39
M08	93	89.34	90.98	94.26	95.90
M10	93	93.10	93.97	96.55	97.41

构音障碍患者	语音清晰度水平/%	WRA/%
构音障碍患者	语音清晰度水平/%	特征1	特征2	特征3	特征4
M04	2	72.94	78.82	87.06	88.24
F03	6	81.03	86.20	91.38	93.10
M12	7	75.29	77.65	89.41	90.59
M01	15	70.59	70.59	85.29	85.29
M07	28	90.24	91.86	96.51	94.19
F02	29	86.44	91.52	96.61	96.61
M06	39	87.80	89.02	95.12	96.34
M16	43	89.02	84.15	93.90	95.12
M05	58	91.46	93.90	93.90	95.12
F04	62	87.30	88.89	95.24	96.83
M11	62	87.65	90.12	92.59	93.83
M09	86	84.30	92.56	95.87	96.69
M14	90	92.77	95.18	96.39	97.59
M08	93	89.34	91.80	95.08	96.72
M10	93	93.97	94.83	97.41	98.28