Journal of Northeastern University(Natural Science) ›› 2024, Vol. 45 ›› Issue (6): 793-801.DOI: 10.12068/j.issn.1005-3026.2024.06.005
• Information & Control • Previous Articles
Pei-yun XUE1,2(), Jing BAI1, Nan ZHANG3, Jian-xing ZHAO1
Received:
2023-05-23
Online:
2024-06-15
Published:
2024-09-18
Contact:
Pei-yun XUE
About author:
XUE Pei-yun, E-mail: xuepeiyun@tyut.edu.cnCLC Number:
Pei-yun XUE, Jing BAI, Nan ZHANG, Jian-xing ZHAO. VMD Based Binary Channels Speech Feature Map Extraction Algorithm for Dysarthria[J]. Journal of Northeastern University(Natural Science), 2024, 45(6): 793-801.
语料名称 | 发音项 | 合计 |
---|---|---|
计算机 命令单词 | ESCAPE,LEFT,LINE,PARAGRAPH,PASTE,RIGHT,SHIFT,SENTENCE,TAB,ALT,BACKSPACE, COMMAND,CONTROL,COPY,CUT,DELETE,DOWNWARD,ENTER,UPWARD | 19 |
数字单词 | ZERO,ONE,TWO,THREE,FOUR,FIVE,SIX,SEVEN,EIGHT,NINE | 10 |
Table 1 Pronunciation corpus of patients with dysarthria
语料名称 | 发音项 | 合计 |
---|---|---|
计算机 命令单词 | ESCAPE,LEFT,LINE,PARAGRAPH,PASTE,RIGHT,SHIFT,SENTENCE,TAB,ALT,BACKSPACE, COMMAND,CONTROL,COPY,CUT,DELETE,DOWNWARD,ENTER,UPWARD | 19 |
数字单词 | ZERO,ONE,TWO,THREE,FOUR,FIVE,SIX,SEVEN,EIGHT,NINE | 10 |
特征 | WRA/% | ||
---|---|---|---|
模型1 | 模型2 | 模型3 | |
特征1 | 83.74 | 84.45 | 85.47 |
特征2 | 85.94 | 86.49 | 87.82 |
特征3 | 91.52 | 92.14 | 93.48 |
特征4 | 92.69 | 93.24 | 94.34 |
特征5 | 69.91 | 70.78 | 71.88 |
Table 2 Recognition results of five features on
特征 | WRA/% | ||
---|---|---|---|
模型1 | 模型2 | 模型3 | |
特征1 | 83.74 | 84.45 | 85.47 |
特征2 | 85.94 | 86.49 | 87.82 |
特征3 | 91.52 | 92.14 | 93.48 |
特征4 | 92.69 | 93.24 | 94.34 |
特征5 | 69.91 | 70.78 | 71.88 |
构音障碍患者 | 语音清晰度 水平/% | WRA/% | |||
---|---|---|---|---|---|
特征1 | 特征2 | 特征3 | 特征4 | ||
M04 | 2 | 71.76 | 77.65 | 85.88 | 87.06 |
F03 | 6 | 79.31 | 84.48 | 89.66 | 91.38 |
M12 | 7 | 74.12 | 76.47 | 88.24 | 89.41 |
M01 | 15 | 70.59 | 69.12 | 83.82 | 85.29 |
M07 | 28 | 88.37 | 90.70 | 95.35 | 93.02 |
F02 | 29 | 84.75 | 89.83 | 94.92 | 94.92 |
M06 | 39 | 86.59 | 86.59 | 93.90 | 95.12 |
M16 | 43 | 87.80 | 82.93 | 91.46 | 93.90 |
M05 | 58 | 90.24 | 92.68 | 92.68 | 95.12 |
F04 | 62 | 85.71 | 87.30 | 93.65 | 95.24 |
M11 | 62 | 86.42 | 88.89 | 91.36 | 92.59 |
M09 | 86 | 84.30 | 91.74 | 95.04 | 95.87 |
M14 | 90 | 92.77 | 93.98 | 95.18 | 96.39 |
M08 | 93 | 89.34 | 90.98 | 94.26 | 95.90 |
M10 | 93 | 93.10 | 93.97 | 96.55 | 97.41 |
Table 3 Recognition results of four features from 15
构音障碍患者 | 语音清晰度 水平/% | WRA/% | |||
---|---|---|---|---|---|
特征1 | 特征2 | 特征3 | 特征4 | ||
M04 | 2 | 71.76 | 77.65 | 85.88 | 87.06 |
F03 | 6 | 79.31 | 84.48 | 89.66 | 91.38 |
M12 | 7 | 74.12 | 76.47 | 88.24 | 89.41 |
M01 | 15 | 70.59 | 69.12 | 83.82 | 85.29 |
M07 | 28 | 88.37 | 90.70 | 95.35 | 93.02 |
F02 | 29 | 84.75 | 89.83 | 94.92 | 94.92 |
M06 | 39 | 86.59 | 86.59 | 93.90 | 95.12 |
M16 | 43 | 87.80 | 82.93 | 91.46 | 93.90 |
M05 | 58 | 90.24 | 92.68 | 92.68 | 95.12 |
F04 | 62 | 85.71 | 87.30 | 93.65 | 95.24 |
M11 | 62 | 86.42 | 88.89 | 91.36 | 92.59 |
M09 | 86 | 84.30 | 91.74 | 95.04 | 95.87 |
M14 | 90 | 92.77 | 93.98 | 95.18 | 96.39 |
M08 | 93 | 89.34 | 90.98 | 94.26 | 95.90 |
M10 | 93 | 93.10 | 93.97 | 96.55 | 97.41 |
构音障碍患者 | 语音清晰度水平/% | WRA/% | |||
---|---|---|---|---|---|
特征1 | 特征2 | 特征3 | 特征4 | ||
M04 | 2 | 72.94 | 78.82 | 87.06 | 88.24 |
F03 | 6 | 81.03 | 86.20 | 91.38 | 93.10 |
M12 | 7 | 75.29 | 77.65 | 89.41 | 90.59 |
M01 | 15 | 70.59 | 70.59 | 85.29 | 85.29 |
M07 | 28 | 90.24 | 91.86 | 96.51 | 94.19 |
F02 | 29 | 86.44 | 91.52 | 96.61 | 96.61 |
M06 | 39 | 87.80 | 89.02 | 95.12 | 96.34 |
M16 | 43 | 89.02 | 84.15 | 93.90 | 95.12 |
M05 | 58 | 91.46 | 93.90 | 93.90 | 95.12 |
F04 | 62 | 87.30 | 88.89 | 95.24 | 96.83 |
M11 | 62 | 87.65 | 90.12 | 92.59 | 93.83 |
M09 | 86 | 84.30 | 92.56 | 95.87 | 96.69 |
M14 | 90 | 92.77 | 95.18 | 96.39 | 97.59 |
M08 | 93 | 89.34 | 91.80 | 95.08 | 96.72 |
M10 | 93 | 93.97 | 94.83 | 97.41 | 98.28 |
Table 4 Recognition results of four features from 15
构音障碍患者 | 语音清晰度水平/% | WRA/% | |||
---|---|---|---|---|---|
特征1 | 特征2 | 特征3 | 特征4 | ||
M04 | 2 | 72.94 | 78.82 | 87.06 | 88.24 |
F03 | 6 | 81.03 | 86.20 | 91.38 | 93.10 |
M12 | 7 | 75.29 | 77.65 | 89.41 | 90.59 |
M01 | 15 | 70.59 | 70.59 | 85.29 | 85.29 |
M07 | 28 | 90.24 | 91.86 | 96.51 | 94.19 |
F02 | 29 | 86.44 | 91.52 | 96.61 | 96.61 |
M06 | 39 | 87.80 | 89.02 | 95.12 | 96.34 |
M16 | 43 | 89.02 | 84.15 | 93.90 | 95.12 |
M05 | 58 | 91.46 | 93.90 | 93.90 | 95.12 |
F04 | 62 | 87.30 | 88.89 | 95.24 | 96.83 |
M11 | 62 | 87.65 | 90.12 | 92.59 | 93.83 |
M09 | 86 | 84.30 | 92.56 | 95.87 | 96.69 |
M14 | 90 | 92.77 | 95.18 | 96.39 | 97.59 |
M08 | 93 | 89.34 | 91.80 | 95.08 | 96.72 |
M10 | 93 | 93.97 | 94.83 | 97.41 | 98.28 |
方法 | 特征参数 | WRA/% |
---|---|---|
ANN+MLP[ | MFCC | 68.88 |
LL-SVM[ | MFCC | 87.91 |
SV+S-CNN[ | 频谱图 | 89.54 |
特征2+模型3 | BCFbank特征 | 87.82 |
特征4+模型3 | MBCFbank特征图谱 | 94.34 |
Table 5 Comparison of the method with other
方法 | 特征参数 | WRA/% |
---|---|---|
ANN+MLP[ | MFCC | 68.88 |
LL-SVM[ | MFCC | 87.91 |
SV+S-CNN[ | 频谱图 | 89.54 |
特征2+模型3 | BCFbank特征 | 87.82 |
特征4+模型3 | MBCFbank特征图谱 | 94.34 |
1 | Mohammed S Y, Sid‑Ahmed S, Brahim‑Fares Z,et al.Improving dysarthric speech recognition using empirical mode decomposition and convolutional neural network[J].EURASIP Journal on Audio,Speech,and Music Processing,2020,2020(1):1-7. |
2 | Al‑Qatab B A, Mustafa M B.Classification of dysarthric speech according to the severity of impairment:an analysis of acoustic features[J].IEEE Access,2021(9):18183-18194. |
3 | Liu S, Hu S, Xie X,et al.Recent progress in the CUHK dysarthric speech recognition system[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2021,29(99):2267-2281. |
4 | Yue Z, Loweimi E, Christensen H,et al.Acoustic modelling from raw source and filter components for dysarthric speech recognition[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2022(30):2968-2980. |
5 | 梁正友,黎雨星,孙宇,等.基于多特征组合的构音障碍语音识别[J].计算机工程与设计,2022,43(2):567-572. |
Liang Zheng‑you, Li Yu‑xing, Sun Yu,et al.Speech recognition with dysarthria based on multi‑feature combination[J].Computer Engineering and Design,2022,43(2):567-572. | |
6 | Jiao Y, Tu M, Berisha V,et al.Simulating dysarthric speech for training data augmentation in clinical speech applications[C]//2018 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).Calgary Allcerta:IEEE,2018:6009-6013. |
7 | Yilmaz E, Mitra V, Sivaraman G,et al.Articulatory and bottleneck features for speaker‑independent ASR of dysarthric speech[J].Computer Speech & Language,2019,58:319-334. |
8 | Zaidi B F, Selouani S A, Boudraa M,et al.Deep neural network architectures for dysarthric speech analysis and recognition[J].Neural Computing and Applications,2021,33(15):9089-9108. |
9 | Mariya T A, Vijayalakshmi P, Nagarajan T.Data augmentation techniques for transfer learning‑based continuous dysarthric speech recognition[J].Circuits,Systems,and Signal Processing,2023,42(1):601-622. |
10 | 李东,张雪英,段淑斐,等.结合语音融合特征和随机森林的构音障碍识别[J].西安电子科技大学学报,2018,45(3):149-155. |
Li Dong, Zhang Xue‑ying, Duan Shu‑fei,et al.Articulation disorder recognition based on speech fusion features and random forest[J].Journal of Xidian University,2018,45(3):149-155. | |
11 | 吴丽丹.基于深度时序网络的多视图构音障碍语音识别[D].上海:华东师范大学,2021. |
Wu Li‑dan.Multi‑view articulation disorder speech recognition based on deep temporal network[D].Shanghai:East China Normal University,2021. | |
12 | 王赵国,韦存海,彭雅妮,等.基于GFCC-SVM-RFE的电力设备声音特征提取方法[J].电力信息与通信技术,2022,20(9):34-42. |
Wang Zhao‑guo, Wei Cun‑hai, Peng Ya‑ni,et al.Sound feature extraction method of Power Equipment based on GFCC‑SVM‑RFE[J].Electric Power Information and Communication Technology,2022,20(9):34-42. | |
13 | Dragomiretskiy K, Zosso D.Variational mode decomposition[J].IEEE Transactions on Signal Processing,2014,62(3):531-544. |
14 | Fritsch J, Magimai‑Doss M.Utterance verification‑based dysarthric speech intelligibility assessment using phonetic posterior features[J].IEEE Signal Processing Letters,2021(28):224-228. |
15 | Shahamiri S R, Salim S.Artificial neural networks as speech recognisers for dysarthric speech:identifying the best‑performing set of MFCC parameters and studying a speaker‑independent approach[J].Advanced Engineering Informatics,2014,28(1):102-110. |
16 | Rajeswari N, Chandrakala S.Generative model‑driven feature learning for dysarthric speech recognition[J].Biocybernetics & Biomedical Engineering,2016,36(4):553-561. |
17 | Shahamiri S R.Speech vision:an end‑to‑end deep learning‑based dysarthric automatic speech recognition system[J].IEEE Transactions on Neural Systems and Rehabilitation Engineering,2021(29):852-861. |
[1] | SUN Ying, LI Ze, ZHANG Xue-ying. Speech Emotion Recognition Based on Constrained Bi-channel Model [J]. Journal of Northeastern University(Natural Science), 2023, 44(11): 1537-1542. |
[2] | LI Shou-tao, QU Ru-yi, ZHANG Yu, YU Ding-li. Freezing of Gait Recognition Method Based on Variational Mode Decomposition [J]. Journal of Northeastern University(Natural Science), 2023, 44(11): 1543-1548. |
[3] | WANG Hong-min, CHAN Liang. Early Wear Diagnosis of Gears Based on Spectrum Correlation Analysis [J]. Journal of Northeastern University(Natural Science), 2023, 44(1): 18-25. |
[4] | LIU Yang, YAN Dong-mei, MENG Fan-wei. Improved Two-Branch Person Re-identification Algorithm Based on Transformer [J]. Journal of Northeastern University(Natural Science), 2023, 44(1): 26-32. |
[5] | MA Yuan-yuan, LIU Yan-ze, LIU Cheng-long, ZHANG Tian-jie. Chinese Investors’ Multi-perspective Sentiment Analysis and Its Role in Stock Market Forecasting [J]. Journal of Northeastern University(Natural Science), 2022, 43(8): 1201-1209. |
[6] | FAN Chun-long, LI Yan-da, XIA Xiu-feng, QIAO Jian-zhong. A General Adversarial Attack Method Based on Random Gradient Ascent and Spherical Projection [J]. Journal of Northeastern University(Natural Science), 2022, 43(2): 168-175. |
[7] | WANG Lu, WANG Shuai, ZHANG Guo-feng, XU Li-sheng. Pedestrian Detection Based on Semantic Segmentation Attention and Visible Region Prediction [J]. Journal of Northeastern University(Natural Science), 2021, 42(9): 1261-1267. |
[8] | REN Zhao-hui, YU Tian-zhuang, DING Dong, ZHOU Shi-hua. Fault Diagnosis Method of Rolling Bearing Based on VMD-DBN [J]. Journal of Northeastern University(Natural Science), 2021, 42(8): 1105-1110. |
[9] | HUANG Wei-qiang, ZHAO Yang, YAO Shuang. Tail Risk Spillover Effect Between Oil Market and Stock Market:Based on Variational Mode Decomposition and Dynamic Copula Function [J]. Journal of Northeastern University(Natural Science), 2021, 42(8): 1186-1193. |
[10] | ZHANG Tao, LIU Tian-wei, DU Wen-li. A Convolutional Neural Network Based Local Dimming Technology [J]. Journal of Northeastern University(Natural Science), 2021, 42(5): 624-632. |
[11] | ZHANG Yong-chao, LI Qi, REN Zhao-hui, ZHOU Shi-hua. Cross-Domain Fault Diagnosis of Rolling Bearings Using Domain Adaptation with Classifier Discrepancy [J]. Journal of Northeastern University(Natural Science), 2021, 42(3): 367-372. |
[12] | JIANG Fang-fang, WANG Hao-qian, CHENG Tian-qing, HONG Chu-hang. Atrial Fibrillation Detection Method Based on Phase Space Reconstruction Using Ballistocardiogram Signal [J]. Journal of Northeastern University(Natural Science), 2021, 42(11): 1547-1554. |
[13] | FANG Liang, ZHOU Yun, TANG Zhi-quan. Image Classification of Corroded Steel Reinforcement Based on Optimized Residual Network [J]. Journal of Northeastern University(Natural Science), 2021, 42(11): 1625-1633. |
[14] | ZHANG Zhi-min, QIAO Jian-zhong, LIN Shu-kuan, WANG Pin-he. A View Reconstruction Method Based on Deep Network [J]. Journal of Northeastern University Natural Science, 2020, 41(8): 1065-1069. |
[15] | XU Jiu-qiang, ZHANG Jin-peng, JIA Yu-qi, SHAO Jian-xin. Ensemble Learning Based Recognition Method for Bundle Branch Block [J]. Journal of Northeastern University Natural Science, 2020, 41(3): 321-326. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||