Classification of functional dysphonia using the tunable Q wavelet transform

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorMittapalle, Kiranen_US
dc.contributor.authorYagnavajjula, Madhuen_US
dc.contributor.authorAlku, Paavoen_US
dc.contributor.departmentDepartment of Information and Communications Engineeringen
dc.contributor.groupauthorSpeech Communication Technologyen
dc.date.accessioned2023-10-25T07:36:05Z
dc.date.available2023-10-25T07:36:05Z
dc.date.issued2023-11en_US
dc.description.abstractFunctional dysphonia (FD) refers to an abnormality in voice quality in the absence of an identifiable lesion. In this paper, we propose an approach based on the tunable Q wavelet transform (TQWT) to automatically classify two types of FD (hyperfunctional dysphonia and hypofunctional dysphonia) from a healthy voice using the acoustic voice signal. Using TQWT, voice signals were decomposed into sub-bands and the entropy values extracted from the sub-bands were utilized as features for the studied 3-class classification problem. In addition, the Mel-frequency cepstral coefficient (MFCC) and glottal features were extracted from the acoustic voice signal and the estimated glottal source signal, respectively. A convolutional neural network (CNN) classifier was trained separately for the TQWT, MFCC and glottal features. Experiments were conducted using voice signals of 57 healthy speakers and 113 FD patients (72 with hyperfunctional dysphonia and 41 with hypofunctional dysphonia) taken from the VOICED database. These experiments revealed that the TQWT features yielded an absolute improvement of 5.5% and 4.5% compared to the baseline MFCC features and glottal features, respectively. Furthermore, the highest classification accuracy (67.91%) was obtained using the combination of the TQWT and glottal features, which indicates the complementary nature of these features.en
dc.description.versionPeer revieweden
dc.format.extent9
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationMittapalle, K, Yagnavajjula, M & Alku, P 2023, 'Classification of functional dysphonia using the tunable Q wavelet transform', Speech Communication, vol. 155, 102989. https://doi.org/10.1016/j.specom.2023.102989en
dc.identifier.doi10.1016/j.specom.2023.102989en_US
dc.identifier.issn0167-6393
dc.identifier.issn1872-7182
dc.identifier.otherPURE UUID: a3e4b986-a348-4fba-af03-42b0ef637614en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/a3e4b986-a348-4fba-af03-42b0ef637614en_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85173623431&partnerID=8YFLogxKen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/125688219/1-s2.0-S0167639323001231-main.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/124269
dc.identifier.urnURN:NBN:fi:aalto-202310256642
dc.language.isoenen
dc.publisherElsevier
dc.relation.ispartofseriesSpeech Communicationen
dc.relation.ispartofseriesVolume 155en
dc.rightsopenAccessen
dc.subject.keywordFunctional dysphoniaen_US
dc.subject.keywordtunable Q wavelet transformen_US
dc.subject.keywordglottal featuresen_US
dc.subject.keywordMFCCen_US
dc.subject.keywordconvolutional neural networken_US
dc.titleClassification of functional dysphonia using the tunable Q wavelet transformen
dc.typeA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessäfi

Files