The Effect of the MFCC Frame Length in Automatic Voice Pathology Detection

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorTirronen, Saska
dc.contributor.authorKadiri, Sudarsana
dc.contributor.authorAlku, Paavo
dc.contributor.departmentDepartment of Signal Processing and Acousticsen
dc.contributor.groupauthorSpeech Communication Technologyen
dc.date.accessioned2025-05-07T05:54:17Z
dc.date.available2025-05-07T05:54:17Z
dc.date.issued2024-09
dc.description.abstractAutomatic voice pathology detection is a research topic, which has gained increasing interest recently. Although methods based on deep learning are becoming popular, the classical pipeline systems based on a two-stage architecture consisting of a feature extraction stage and a classifier stage are still widely used. In these classical detection systems, frame-wise computation of mel-frequency cepstral coefficients (MFCCs) is the most popular feature extraction method. However, no systematic study has been conducted to investigate the effect of the MFCC frame length on automatic voice pathology detection. In this work, we studied the effect of the MFCC frame length in voice pathology detection using three disorders (hyperkinetic dysphonia, hypokinetic dysphonia and reflux laryngitis) from the Saarbrûcken Voice Disorders (SVD) database. The detection performance was compared between speaker-dependent and speaker-independent scenarios as well as between speaking task -dependent and speaking task -independent scenarios. The Support Vector Machine, which is the most widely used classifier in the study area, was used as the classifier. The results show that the detection accuracy depended on the MFFC frame length in all the scenarios studied. The best detection accuracy was obtained by using a MFFC frame length of 500 ms with a shift of 5 ms.en
dc.description.versionPeer revieweden
dc.format.extent8
dc.format.mimetypeapplication/pdf
dc.identifier.citationTirronen, S, Kadiri, S & Alku, P 2024, 'The Effect of the MFCC Frame Length in Automatic Voice Pathology Detection', Journal of Voice, vol. 38, no. 5, pp. 975-982. https://doi.org/10.1016/j.jvoice.2022.03.021en
dc.identifier.doi10.1016/j.jvoice.2022.03.021
dc.identifier.issn0892-1997
dc.identifier.issn1873-4588
dc.identifier.otherPURE UUID: d6851c94-6047-4943-a319-8f42732e1834
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/d6851c94-6047-4943-a319-8f42732e1834
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85132679710&partnerID=8YFLogxK
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/180456706/1-s2.0-S089219972200087X-main.pdf
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/135244
dc.identifier.urnURN:NBN:fi:aalto-202505073529
dc.language.isoenen
dc.publisherElsevier
dc.relation.ispartofseriesJournal of Voiceen
dc.relation.ispartofseriesVolume 38, issue 5, pp. 975-982en
dc.rightsopenAccessen
dc.rightsCC BY
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subject.keywordvoice pathology
dc.subject.keywordpathology detection
dc.subject.keywordspeech analysis
dc.subject.keywordMFCC
dc.subject.keywordSVM
dc.titleThe Effect of the MFCC Frame Length in Automatic Voice Pathology Detectionen
dc.typeA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessäfi
dc.type.versionpublishedVersion

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
1-s2.0-S089219972200087X-main.pdf
Size:
1.27 MB
Format:
Adobe Portable Document Format