Learning Centre

End-to-End Optimization of Source Models for Speech and Audio Coding Using a Machine Learning Framework

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.author Bäckström, Tom
dc.date.accessioned 2019-09-25T14:13:01Z
dc.date.available 2019-09-25T14:13:01Z
dc.date.issued 2019-09
dc.identifier.citation Bäckström , T 2019 , End-to-End Optimization of Source Models for Speech and Audio Coding Using a Machine Learning Framework . in Proceedings of Interspeech . Interspeech - Annual Conference of the International Speech Communication Association , ISCA , pp. 3401-3405 , Interspeech , Graz , Austria , 15/09/2019 . https://doi.org/10.21437/Interspeech.2019-1284 en
dc.identifier.issn 2308-457X
dc.identifier.other PURE UUID: 8cb5f77d-8899-4276-a749-9988cafc80f4
dc.identifier.other PURE ITEMURL: https://research.aalto.fi/en/publications/8cb5f77d-8899-4276-a749-9988cafc80f4
dc.identifier.other PURE LINK: https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1284.pdf
dc.identifier.other PURE FILEURL: https://research.aalto.fi/files/37082504/ELEC_Backstrom_End_to_end_Interspeech.pdf
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/40464
dc.description.abstract Speech coding is the most commonly used application of speech processing. Accumulated layers of improvements have however made codecs so complex that optimization of individual modules becomes increasingly difficult. This work introduces machine learning methodology to speech and audio coding, such that we can optimize quality in terms of overall entropy. We can then use conventional quantization, coding and perceptual models without modification such that the codec adheres to conventional requirements on algorithmic complexity, latency and robustness to packet loss. Experiments demonstrate that end-to-end optimization of quantization accuracy of the spectral envelope can be used for a lossless reduction in bitrate of 0.4 kbits/s. en
dc.format.mimetype application/pdf
dc.language.iso en en
dc.relation.ispartof Interspeech en
dc.relation.ispartofseries Proceedings of Interspeech en
dc.relation.ispartofseries Interspeech - Annual Conference of the International Speech Communication Association en
dc.rights openAccess en
dc.title End-to-End Optimization of Source Models for Speech and Audio Coding Using a Machine Learning Framework en
dc.type A4 Artikkeli konferenssijulkaisussa fi
dc.description.version Peer reviewed en
dc.contributor.department Dept Signal Process and Acoust
dc.subject.keyword speech and audio coding
dc.subject.keyword end-to-end optimization
dc.subject.keyword speech source modeling
dc.identifier.urn URN:NBN:fi:aalto-201909255485
dc.identifier.doi 10.21437/Interspeech.2019-1284
dc.type.version publishedVersion


Files in this item

Files Size Format View

There are no open access files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse

Statistics