Glottal features for classification of phonation type from speech and neck surface accelerometer signals

dc.contributorAalto Universityen
dc.contributor.authorKadiri, Sudarsana Reddy
dc.contributor.authorAlku, Paavo
dc.contributor.departmentSpeech Communication Technology
dc.contributor.departmentDept Signal Process and Acoust
dc.description.abstractGlottal source characteristics vary between phonation types due to the tension of laryngeal muscles with the respiratory effort. Previous studies in the classification of phonation type have mainly used speech signals recorded by microphone. Recently, two studies were published in the classification of phonation type using neck surface accelerometer (NSA) signals. However, there are no previous studies comparing the use of the acoustic speech signal vs. the NSA signal as input in classifying phonation type. Therefore, the current study investigates simultaneously recorded speech and NSA signals in the classification of three phonation types (breathy, modal, pressed). The general goal is to understand which of the two signals (speech vs. NSA) is more effective in the classification task. We hypothesize that by using the same feature set for both signals, classification accuracy is higher for the NSA signal, which is more closely related to the physical vibration of the vocal folds and less affected by the vocal tract compared to the acoustical speech signal. Glottal source waveforms were computed using two signal processing methods, quasi-closed phase (QCP) glottal inverse filtering and zero frequency filtering (ZFF), and a group of time-domain and frequency-domain scalar features were computed from the obtained waveforms. In addition, the study investigated the use of mel-frequency cepstral coefficients (MFCCs) derived from the glottal source waveforms computed by QCP and ZFF. Classification experiments with support vector machine classifiers revealed that the NSA signal showed better discrimination of the phonation types compared to the speech signal when the same feature set was used. Furthermore, it was observed that the glottal features showed complementary information with the conventional MFCC features resulting in the best classification accuracy both for the NSA signal (86.9%) and the speech signal (80.6%).en
dc.description.versionPeer revieweden
dc.identifier.citationKadiri , S R & Alku , P 2021 , ' Glottal features for classification of phonation type from speech and neck surface accelerometer signals ' , Computer Speech and Language , vol. 70 , 101232 .
dc.identifier.otherPURE UUID: f6b28d7f-481e-48d0-8341-d46938e13fa4
dc.identifier.otherPURE ITEMURL:
dc.identifier.otherPURE FILEURL:
dc.publisherAcademic Press Inc.
dc.relation.ispartofseriesComputer Speech and Languageen
dc.relation.ispartofseriesVolume 70en
dc.subject.keywordphonation type
dc.subject.keywordvoice quality
dc.subject.keywordneck surface accelerometer
dc.subject.keywordglottal source waveform
dc.subject.keywordsupport vector machine
dc.titleGlottal features for classification of phonation type from speech and neck surface accelerometer signalsen
dc.typeA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessäfi