Comparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiers

Loading...
Thumbnail Image
Access rights
openAccess
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
Date
2022-09
Major/Subject
Mcode
Degree programme
Language
en
Pages
5
2173 - 2177
Series
Proceedings of Interspeech'22, Interspeech
Abstract
The present study investigates the use of 1-dimensional (1-D) and 2-dimensional (2-D) spectral feature representations in voice pathology detection with several classical machine learning (ML) and recent deep learning (DL) classifiers. Four popularly used spectral feature representations (static mel-frequency cepstral coefficients (MFCCs), dynamic MFCCs, spectrogram and mel-spectrogram) are derived in both the 1-D and 2-D form from voice signals. Three widely used ML classifiers (support vector machine (SVM), random forest (RF) and Adaboost) and three DL classifiers (deep neural network (DNN), long short-term memory (LSTM) network, and convolutional neural network (CNN)) are used with the 1-D feature representations. In addition, CNN classifiers are built using the 2-D feature representations. The popularly used HUPA database is considered in the pathology detection experiments. Experimental results revealed that using the CNN classifier with the 2-D feature representations yielded better accuracy compared to using the ML and DL classifiers with the 1-D feature representations. The best performance was achieved using the 2-D CNN classifier based on dynamic MFCCs that showed a detection accuracy of 81%.
Description
This work was supported by the Academy of Finland (grant number 313390). The computational resources were provided by Aalto ScienceIT.
Keywords
Other note
Citation
Javanmardi, F, Kadiri, S, Kodali, M & Alku, P 2022, Comparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiers . in INTERSPEECH 2022 . vol. 2022-September, Interspeech, International Speech Communication Association (ISCA), pp. 2173 - 2177, Interspeech, Incheon, Korea, Republic of, 18/09/2022 . https://doi.org/10.21437/Interspeech.2022-10420