Comparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiers
Loading...
Access rights
openAccess
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal
View/Open full text file from the Research portal
Other link related to publication
View publication in the Research portal
View/Open full text file from the Research portal
Other link related to publication
Date
2022-09
Department
Major/Subject
Mcode
Degree programme
Language
en
Pages
5
2173 - 2177
2173 - 2177
Series
Proceedings of Interspeech'22, Interspeech
Abstract
The present study investigates the use of 1-dimensional (1-D) and 2-dimensional (2-D) spectral feature representations in voice pathology detection with several classical machine learning (ML) and recent deep learning (DL) classifiers. Four popularly used spectral feature representations (static mel-frequency cepstral coefficients (MFCCs), dynamic MFCCs, spectrogram and mel-spectrogram) are derived in both the 1-D and 2-D form from voice signals. Three widely used ML classifiers (support vector machine (SVM), random forest (RF) and Adaboost) and three DL classifiers (deep neural network (DNN), long short-term memory (LSTM) network, and convolutional neural network (CNN)) are used with the 1-D feature representations. In addition, CNN classifiers are built using the 2-D feature representations. The popularly used HUPA database is considered in the pathology detection experiments. Experimental results revealed that using the CNN classifier with the 2-D feature representations yielded better accuracy compared to using the ML and DL classifiers with the 1-D feature representations. The best performance was achieved using the 2-D CNN classifier based on dynamic MFCCs that showed a detection accuracy of 81%.Description
This work was supported by the Academy of Finland (grant number 313390). The computational resources were provided by Aalto ScienceIT.
Keywords
Other note
Citation
Javanmardi, F, Kadiri, S, Kodali, M & Alku, P 2022, Comparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiers . in INTERSPEECH 2022 . vol. 2022-September, Interspeech, International Speech Communication Association (ISCA), pp. 2173 - 2177, Interspeech, Incheon, Korea, Republic of, 18/09/2022 . https://doi.org/10.21437/Interspeech.2022-10420