Comparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiers

Loading...
Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Conference article in proceedings
Date
2022-09
Major/Subject
Mcode
Degree programme
Language
en
Pages
5
2173 - 2177
Series
Proceedings of Interspeech'22, Interspeech
Abstract
The present study investigates the use of 1-dimensional (1-D) and 2-dimensional (2-D) spectral feature representations in voice pathology detection with several classical machine learning (ML) and recent deep learning (DL) classifiers. Four popularly used spectral feature representations (static mel-frequency cepstral coefficients (MFCCs), dynamic MFCCs, spectrogram and mel-spectrogram) are derived in both the 1-D and 2-D form from voice signals. Three widely used ML classifiers (support vector machine (SVM), random forest (RF) and Adaboost) and three DL classifiers (deep neural network (DNN), long short-term memory (LSTM) network, and convolutional neural network (CNN)) are used with the 1-D feature representations. In addition, CNN classifiers are built using the 2-D feature representations. The popularly used HUPA database is considered in the pathology detection experiments. Experimental results revealed that using the CNN classifier with the 2-D feature representations yielded better accuracy compared tousing the ML and DL classifiers with the 1-D feature representations. The best performance was achieved using the 2-D CNN classifier based on dynamic MFCCs that showed a detection accuracy of 81%.
Description
This work was supported by the Academy of Finland (grant number 313390). The computational resources were provided by Aalto ScienceIT.
Keywords
Other note
Citation
Javanmardi , F , Kadiri , S , Kodali , M & Alku , P 2022 , Comparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiers . in INTERSPEECH 2022 . vol. 2022-September , Interspeech , International Speech Communication Association (ISCA) , pp. 2173 - 2177 , Interspeech , Incheon , Korea, Republic of , 18/09/2022 . https://doi.org/10.21437/Interspeech.2022-10420