Glottal source estimation from coded telephone speech using a deep neural network

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorNonavinakere Prabhakera, Narendraen_US
dc.contributor.authorAiraksinen, Manuen_US
dc.contributor.authorAlku, Paavoen_US
dc.contributor.departmentDepartment of Signal Processing and Acousticsen
dc.contributor.groupauthorSpeech Communication Technologyen
dc.date.accessioned2017-11-21T13:36:31Z
dc.date.available2017-11-21T13:36:31Z
dc.date.issued2017-08en_US
dc.description.abstractIn speech analysis, the information about the glottal source is obtained from speech by using glottal inverse filtering (GIF). The accuracy of state-of-the-art GIF methods is sufficiently high when the input speech signal is of high-quality (i.e., with little noise or reverberation). However, in realistic conditions, particularly when GIF is computed from coded telephone speech, the accuracy of GIF methods deteriorates severely. To robustly estimate the glottal source under coded condition, a deep neural network (DNN)-based method is proposed. The proposed method utilizes a DNN to map the speech features extracted from the coded speech to the glottal flow waveform estimated from the corresponding clean speech. To generate the coded telephone speech, adaptive multi-rate (AMR) codec is utilized which is a widely used speech compression method. The proposed glottal source estimation method is compared with two existing GIF methods, closed phase covariance analysis (CP) and iterative adaptive inverse filtering (IAIF). The results indicate that the proposed DNN-based method is capable of estimating glottal flow waveforms from coded telephone speech with a considerably better accuracy in comparison to CP and IAIF.en
dc.description.versionPeer revieweden
dc.format.extent5
dc.format.extent3931-3935
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationNonavinakere Prabhakera, N, Airaksinen, M & Alku, P 2017, Glottal source estimation from coded telephone speech using a deep neural network . in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH . vol. 2017-August, Interspeech: Annual Conference of the International Speech Communication Association, International Speech Communication Association (ISCA), pp. 3931-3935, Interspeech, Stockholm, Sweden, 20/08/2017 . https://doi.org/10.21437/Interspeech.2017-882en
dc.identifier.doi10.21437/Interspeech.2017-882en_US
dc.identifier.issn1990-9772
dc.identifier.otherPURE UUID: 667abd30-6669-4fbd-a9dd-90d59daff94fen_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/667abd30-6669-4fbd-a9dd-90d59daff94fen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/15742494/narendra_interspeech0882.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/28802
dc.identifier.urnURN:NBN:fi:aalto-201711217623
dc.language.isoenen
dc.relation.ispartofInterspeechen
dc.relation.ispartofseriesProceedings of Interspeech 2017en
dc.relation.ispartofseriesInterspeech: Annual Conference of the International Speech Communication Associationen
dc.rightsopenAccessen
dc.rights.copyright© 2017 ISCA. This article was originally published in the Proceedings of Interspeech 2017: Narendra, N., Airaksinen, M., Alku, P. (2017) Glottal Source Estimation from Coded Telephone Speech Using a Deep Neural Network. Proc. Interspeech 2017, 3931-3935, DOI: 10.21437/Interspeech.2017-882.en_US
dc.subject.keywordglottal source estimationen_US
dc.subject.keywordglottal inverse filteringen_US
dc.subject.keyworddeep neural networken_US
dc.subject.keywordtelephone speechen_US
dc.titleGlottal source estimation from coded telephone speech using a deep neural networken
dc.typeA4 Artikkeli konferenssijulkaisussafi
dc.type.versionpublishedVersion

Files