End-to-End Optimization of Source Models for Speech and Audio Coding Using a Machine Learning Framework
Loading...
Access rights
openAccess
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal
View/Open full text file from the Research portal
Other link related to publication
View publication in the Research portal
View/Open full text file from the Research portal
Other link related to publication
Author
Date
2019-09
Department
Major/Subject
Mcode
Degree programme
Language
en
Pages
Series
Proceedings of Interspeech, Interspeech - Annual Conference of the International Speech Communication Association
Abstract
Speech coding is the most commonly used application of speech processing. Accumulated layers of improvements have however made codecs so complex that optimization of individual modules becomes increasingly difficult. This work introduces machine learning methodology to speech and audio coding, such that we can optimize quality in terms of overall entropy. We can then use conventional quantization, coding and perceptual models without modification such that the codec adheres to conventional requirements on algorithmic complexity, latency and robustness to packet loss. Experiments demonstrate that end-to-end optimization of quantization accuracy of the spectral envelope can be used for a lossless reduction in bitrate of 0.4 kbits/s.Description
Keywords
speech and audio coding, end-to-end optimization, speech source modeling
Other note
Citation
Bäckström, T 2019, End-to-End Optimization of Source Models for Speech and Audio Coding Using a Machine Learning Framework . in Proceedings of Interspeech . Interspeech - Annual Conference of the International Speech Communication Association, International Speech Communication Association (ISCA), pp. 3401-3405, Interspeech, Graz, Austria, 15/09/2019 . https://doi.org/10.21437/Interspeech.2019-1284