Comparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiers
| dc.contributor | Aalto-yliopisto | fi |
| dc.contributor | Aalto University | en |
| dc.contributor.author | Javanmardi, Farhad | en_US |
| dc.contributor.author | Kadiri, Sudarsana | en_US |
| dc.contributor.author | Kodali, Manila | en_US |
| dc.contributor.author | Alku, Paavo | en_US |
| dc.contributor.department | Department of Signal Processing and Acoustics | en |
| dc.contributor.groupauthor | Speech Communication Technology | en |
| dc.date.accessioned | 2022-11-09T08:03:02Z | |
| dc.date.available | 2022-11-09T08:03:02Z | |
| dc.date.issued | 2022-09 | en_US |
| dc.description | This work was supported by the Academy of Finland (grant number 313390). The computational resources were provided by Aalto ScienceIT. | |
| dc.description.abstract | The present study investigates the use of 1-dimensional (1-D) and 2-dimensional (2-D) spectral feature representations in voice pathology detection with several classical machine learning (ML) and recent deep learning (DL) classifiers. Four popularly used spectral feature representations (static mel-frequency cepstral coefficients (MFCCs), dynamic MFCCs, spectrogram and mel-spectrogram) are derived in both the 1-D and 2-D form from voice signals. Three widely used ML classifiers (support vector machine (SVM), random forest (RF) and Adaboost) and three DL classifiers (deep neural network (DNN), long short-term memory (LSTM) network, and convolutional neural network (CNN)) are used with the 1-D feature representations. In addition, CNN classifiers are built using the 2-D feature representations. The popularly used HUPA database is considered in the pathology detection experiments. Experimental results revealed that using the CNN classifier with the 2-D feature representations yielded better accuracy compared to using the ML and DL classifiers with the 1-D feature representations. The best performance was achieved using the 2-D CNN classifier based on dynamic MFCCs that showed a detection accuracy of 81%. | en |
| dc.description.version | Peer reviewed | en |
| dc.format.extent | 5 | |
| dc.format.mimetype | application/pdf | en_US |
| dc.identifier.citation | Javanmardi, F, Kadiri, S, Kodali, M & Alku, P 2022, Comparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiers. in INTERSPEECH 2022. vol. 2022-September, Interspeech, International Speech Communication Association (ISCA), pp. 2173 - 2177, Interspeech, Incheon, Korea, Republic of, 18/09/2022. https://doi.org/10.21437/Interspeech.2022-10420 | en |
| dc.identifier.doi | 10.21437/Interspeech.2022-10420 | en_US |
| dc.identifier.issn | 295-1796 | |
| dc.identifier.other | PURE UUID: a464e080-b473-43bf-a74a-1b4bb65f2dfc | en_US |
| dc.identifier.other | PURE ITEMURL: https://research.aalto.fi/en/publications/a464e080-b473-43bf-a74a-1b4bb65f2dfc | en_US |
| dc.identifier.other | PURE FILEURL: https://research.aalto.fi/files/90941076/javanmardi22_interspeech.pdf | |
| dc.identifier.uri | https://aaltodoc.aalto.fi/handle/123456789/117686 | |
| dc.identifier.urn | URN:NBN:fi:aalto-202211096457 | |
| dc.language.iso | en | en |
| dc.relation.ispartof | Interspeech | en |
| dc.relation.ispartofseries | INTERSPEECH 2022 | en |
| dc.relation.ispartofseries | Volume 2022-September, pp. 2173 - 2177 | en |
| dc.relation.ispartofseries | Interspeech | en |
| dc.rights | openAccess | en |
| dc.title | Comparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiers | en |
| dc.type | A4 Artikkeli konferenssijulkaisussa | fi |
| dc.type.version | publishedVersion |