Analysis and classification of phonation types in speech and singing voice

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorKadiri, Sudarsana Reddy
dc.contributor.authorAlku, Paavo
dc.contributor.authorYegnanarayana, Bayya
dc.contributor.departmentDept Signal Process and Acoust
dc.contributor.departmentInternational Institute of Information Technology Hyderabad
dc.date.accessioned2020-03-06T15:26:55Z
dc.date.available2020-03-06T15:26:55Z
dc.date.embargoinfo:eu-repo/date/embargoEnd/2022-02-26
dc.date.issued2020-04
dc.description.abstractBoth in speech and singing, humans are capable of generating sounds of different phonation types (e.g., breathy, modal and pressed). Previous studies in the analysis and classification of phonation types have mainly used voice source features derived using glottal inverse filtering (GIF). Even though glottal source features are useful in discriminating phonation types in speech, their performance deteriorates in singing voice due to the high fundamental frequency of these sounds that reduces the accuracy of source-filter separation in GIF. In the present study, features describing the glottal source were computed using three signal processing methods that do not compute source-filter separation. These three methods are zero frequency filtering (ZFF), zero time windowing (ZTW) and single frequency filtering (SFF). From each method, a group of scalar features were extracted. In addition, cepstral coefficients were derived from the spectra computed using ZTW and SFF. Experiments were conducted with the proposed features to analyse and classify phonation types using three phonation types (breathy, modal and pressed) for speech and singing voice. Statistical pair-wise comparisons between the phonation types showed that most of the features were capable of separating the phonation types significantly for speech and singing voices. Classification with support vector machine classifiers indicated that the proposed features and their combinations showed improved accuracy compared to usually employed glottal source features and mel-frequency cepstral coefficients (MFCCs).en
dc.description.versionPeer revieweden
dc.format.extent15
dc.format.extent33-47
dc.format.mimetypeapplication/pdf
dc.identifier.citationKadiri , S R , Alku , P & Yegnanarayana , B 2020 , ' Analysis and classification of phonation types in speech and singing voice ' , Speech Communication , vol. 118 , pp. 33-47 . https://doi.org/10.1016/j.specom.2020.02.004en
dc.identifier.doi10.1016/j.specom.2020.02.004
dc.identifier.issn0167-6393
dc.identifier.issn1872-7182
dc.identifier.otherPURE UUID: f9758c8b-aa79-4fb8-b7c1-395af80fbc86
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/f9758c8b-aa79-4fb8-b7c1-395af80fbc86
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85080039271&partnerID=8YFLogxK
dc.identifier.otherPURE LINK: http://www.sciencedirect.com/science/article/pii/S0167639319303358
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/41313010/Analysis_and_Detection_of_Phonation_Types_in_Speech_and_Singing_Voice.pdf
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/43407
dc.identifier.urnURN:NBN:fi:aalto-202003062450
dc.language.isoenen
dc.publisherElsevier
dc.relation.ispartofseriesSpeech Communicationen
dc.relation.ispartofseriesVolume 118en
dc.rightsopenAccessen
dc.subject.keywordPhonation type
dc.subject.keywordVoice quality
dc.subject.keywordSinging voice
dc.subject.keywordGlottal source
dc.subject.keywordGlottal inverse filtering
dc.subject.keywordZero frequency filtering (ZFF)
dc.subject.keywordZero time windowing (ZTW)
dc.subject.keywordSingle frequency filtering (SFF)
dc.titleAnalysis and classification of phonation types in speech and singing voiceen
dc.typeA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessäfi
Files