Classification of Phonation Modes in Classical Singing Using Modulation Power Spectral Features

dc.contributorAalto Universityen
dc.contributor.authorBrandner, Manuel
dc.contributor.authorBereuter, Paul Armin
dc.contributor.authorKadiri, Sudarsana Reddy
dc.contributor.authorSontacchi, Alois
dc.contributor.departmentUniversität für Musik und darstellende Kunst Graz
dc.contributor.departmentDepartment of Information and Communications Engineering
dc.contributor.departmentDepartment of Information and Communications Engineeringen
dc.descriptionPublisher Copyright: © 2013 IEEE.
dc.description.abstractIn singing, the perceptual term 'voice quality' is used to describe expressed emotions and singing styles. In voice physiology research, specific voice qualities are discussed using the term phonation modes and are directly related to the voicing produced by the vocal folds. The control and awareness of phonation modes is vital for professional singers to maintain a healthy voice. Most studies on phonation modes have investigated speech and have used glottal inverse filtering to compute features from an estimated excitation signal. The performance of this method is reported to decrease at high pitches, which limits its usability for the singing voice. To overcome this, this study proposes to use features derived from the modulation power spectrum for phonation mode classification in the singing voice. The exploration of the modulation power spectrum is motivated by the fact that, in singing, temporal modulations (known as vocal vibrato) and spectral modulations hold information of the vocal fold tension. Since there exists no large publicly available dataset of phonation modes in singing, we created a new dataset consisting of six female and four male classical singers, who sang five vowels at different pitches in three phonation modes (breathy, modal, and pressed). Experimental results with a support vector machine classifier reveal that the proposed features show better classification performance compared to state-of-the-art reference features. The performance for the current dataset is at least 10% higher compared to the performance of the reference features (such as glottal source features and MFCCs) in the case of target labels and around 6% higher in the case of perceptually assessed labels.en
dc.description.versionPeer revieweden
dc.identifier.citationBrandner , M , Bereuter , P A , Kadiri , S R & Sontacchi , A 2023 , ' Classification of Phonation Modes in Classical Singing Using Modulation Power Spectral Features ' , IEEE Access , vol. 11 , pp. 29149-29161 .
dc.identifier.otherPURE UUID: 26d31c2b-8ed8-463d-9d7e-c96045637108
dc.identifier.otherPURE ITEMURL:
dc.identifier.otherPURE LINK:
dc.identifier.otherPURE FILEURL:
dc.relation.ispartofseriesIEEE Accessen
dc.relation.ispartofseriesVolume 11en
dc.subject.keywordModulation power spectrum
dc.subject.keywordphonation modes
dc.subject.keywordsinging voice analysis
dc.subject.keywordvoice qualities
dc.titleClassification of Phonation Modes in Classical Singing Using Modulation Power Spectral Featuresen
dc.typeA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessäfi