Glottal source estimation from coded telephone speech using a deep neural network

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.author Nonavinakere Prabhakera, Narendra
dc.contributor.author Airaksinen, Manu
dc.contributor.author Alku, Paavo
dc.date.accessioned 2017-11-21T13:36:31Z
dc.date.available 2017-11-21T13:36:31Z
dc.date.issued 2017-08
dc.identifier.citation Nonavinakere Prabhakera , N , Airaksinen , M & Alku , P 2017 , Glottal source estimation from coded telephone speech using a deep neural network . in Proceedings of Interspeech 2017 . Interspeech: Annual Conference of the International Speech Communication Association , International Speech Communication Association , pp. 3931-3935 , Interspeech , Stockholm , Sweden , 20-24 August . DOI: 10.21437/Interspeech.2017-882 en
dc.identifier.issn 1990-9772
dc.identifier.other PURE UUID: 667abd30-6669-4fbd-a9dd-90d59daff94f
dc.identifier.other PURE ITEMURL: https://research.aalto.fi/en/publications/glottal-source-estimation-from-coded-telephone-speech-using-a-deep-neural-network(667abd30-6669-4fbd-a9dd-90d59daff94f).html
dc.identifier.other PURE FILEURL: https://research.aalto.fi/files/15742494/narendra_interspeech0882.pdf
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/28802
dc.description.abstract In speech analysis, the information about the glottal source is obtained from speech by using glottal inverse filtering (GIF). The accuracy of state-of-the-art GIF methods is sufficiently high when the input speech signal is of high-quality (i.e., with little noise or reverberation). However, in realistic conditions, particularly when GIF is computed from coded telephone speech, the accuracy of GIF methods deteriorates severely. To robustly estimate the glottal source under coded condition, a deep neural network (DNN)-based method is proposed. The proposed method utilizes a DNN to map the speech features extracted from the coded speech to the glottal flow waveform estimated from the corresponding clean speech. To generate the coded telephone speech, adaptive multi-rate (AMR) codec is utilized which is a widely used speech compression method. The proposed glottal source estimation method is compared with two existing GIF methods, closed phase covariance analysis (CP) and iterative adaptive inverse filtering (IAIF). The results indicate that the proposed DNN-based method is capable of estimating glottal flow waveforms from coded telephone speech with a considerably better accuracy in comparison to CP and IAIF. en
dc.format.extent 5
dc.format.extent 3931-3935
dc.format.mimetype application/pdf
dc.language.iso en en
dc.relation.ispartof Interspeech en
dc.relation.ispartofseries Proceedings of Interspeech 2017 en
dc.relation.ispartofseries Interspeech: Annual Conference of the International Speech Communication Association en
dc.rights openAccess en
dc.subject.other 213 Electronic, automation and communications engineering, electronics en
dc.subject.other 3112 Neurosciences en
dc.title Glottal source estimation from coded telephone speech using a deep neural network en
dc.type A4 Artikkeli konferenssijulkaisussa fi
dc.description.version Peer reviewed en
dc.contributor.department Department of Signal Processing and Acoustics
dc.subject.keyword glottal source estimation
dc.subject.keyword glottal inverse filtering
dc.subject.keyword deep neural network
dc.subject.keyword telephone speech
dc.subject.keyword 213 Electronic, automation and communications engineering, electronics
dc.subject.keyword 3112 Neurosciences
dc.identifier.urn URN:NBN:fi:aalto-201711217623
dc.identifier.doi 10.21437/Interspeech.2017-882
dc.type.version publishedVersion


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse

My Account