Comparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiers

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorJavanmardi, Farhaden_US
dc.contributor.authorKadiri, Sudarsanaen_US
dc.contributor.authorKodali, Manilaen_US
dc.contributor.authorAlku, Paavoen_US
dc.contributor.departmentDepartment of Signal Processing and Acousticsen
dc.contributor.groupauthorSpeech Communication Technologyen
dc.date.accessioned2022-11-09T08:03:02Z
dc.date.available2022-11-09T08:03:02Z
dc.date.issued2022-09en_US
dc.descriptionThis work was supported by the Academy of Finland (grant number 313390). The computational resources were provided by Aalto ScienceIT.
dc.description.abstractThe present study investigates the use of 1-dimensional (1-D) and 2-dimensional (2-D) spectral feature representations in voice pathology detection with several classical machine learning (ML) and recent deep learning (DL) classifiers. Four popularly used spectral feature representations (static mel-frequency cepstral coefficients (MFCCs), dynamic MFCCs, spectrogram and mel-spectrogram) are derived in both the 1-D and 2-D form from voice signals. Three widely used ML classifiers (support vector machine (SVM), random forest (RF) and Adaboost) and three DL classifiers (deep neural network (DNN), long short-term memory (LSTM) network, and convolutional neural network (CNN)) are used with the 1-D feature representations. In addition, CNN classifiers are built using the 2-D feature representations. The popularly used HUPA database is considered in the pathology detection experiments. Experimental results revealed that using the CNN classifier with the 2-D feature representations yielded better accuracy compared to using the ML and DL classifiers with the 1-D feature representations. The best performance was achieved using the 2-D CNN classifier based on dynamic MFCCs that showed a detection accuracy of 81%.en
dc.description.versionPeer revieweden
dc.format.extent5
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationJavanmardi, F, Kadiri, S, Kodali, M & Alku, P 2022, Comparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiers. in INTERSPEECH 2022. vol. 2022-September, Interspeech, International Speech Communication Association (ISCA), pp. 2173 - 2177, Interspeech, Incheon, Korea, Republic of, 18/09/2022. https://doi.org/10.21437/Interspeech.2022-10420en
dc.identifier.doi10.21437/Interspeech.2022-10420en_US
dc.identifier.issn295-1796
dc.identifier.otherPURE UUID: a464e080-b473-43bf-a74a-1b4bb65f2dfcen_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/a464e080-b473-43bf-a74a-1b4bb65f2dfcen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/90941076/javanmardi22_interspeech.pdf
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/117686
dc.identifier.urnURN:NBN:fi:aalto-202211096457
dc.language.isoenen
dc.relation.ispartofInterspeechen
dc.relation.ispartofseriesINTERSPEECH 2022en
dc.relation.ispartofseriesVolume 2022-September, pp. 2173 - 2177en
dc.relation.ispartofseriesInterspeechen
dc.rightsopenAccessen
dc.titleComparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiersen
dc.typeA4 Artikkeli konferenssijulkaisussafi
dc.type.versionpublishedVersion

Files