Comparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiers
Loading...
Access rights
openAccess
publishedVersion
URL
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Date
Major/Subject
Mcode
Degree programme
Language
en
Pages
5
Series
INTERSPEECH 2022, Volume 2022-September, pp. 2173 - 2177, Interspeech
Abstract
The present study investigates the use of 1-dimensional (1-D) and 2-dimensional (2-D) spectral feature representations in voice pathology detection with several classical machine learning (ML) and recent deep learning (DL) classifiers. Four popularly used spectral feature representations (static mel-frequency cepstral coefficients (MFCCs), dynamic MFCCs, spectrogram and mel-spectrogram) are derived in both the 1-D and 2-D form from voice signals. Three widely used ML classifiers (support vector machine (SVM), random forest (RF) and Adaboost) and three DL classifiers (deep neural network (DNN), long short-term memory (LSTM) network, and convolutional neural network (CNN)) are used with the 1-D feature representations. In addition, CNN classifiers are built using the 2-D feature representations. The popularly used HUPA database is considered in the pathology detection experiments. Experimental results revealed that using the CNN classifier with the 2-D feature representations yielded better accuracy compared to using the ML and DL classifiers with the 1-D feature representations. The best performance was achieved using the 2-D CNN classifier based on dynamic MFCCs that showed a detection accuracy of 81%.Description
This work was supported by the Academy of Finland (grant number 313390). The computational resources were provided by Aalto ScienceIT.
Keywords
Other note
Citation
Javanmardi, F, Kadiri, S, Kodali, M & Alku, P 2022, Comparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiers. in INTERSPEECH 2022. vol. 2022-September, Interspeech, International Speech Communication Association (ISCA), pp. 2173 - 2177, Interspeech, Incheon, Korea, Republic of, 18/09/2022. https://doi.org/10.21437/Interspeech.2022-10420