End-to-End Optimization of Source Models for Speech and Audio Coding Using a Machine Learning Framework

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorBäckström, Tomen_US
dc.contributor.departmentDepartment of Signal Processing and Acousticsen
dc.contributor.groupauthorSpeech Communication Technologyen
dc.contributor.groupauthorSpeech Interaction Technologyen
dc.date.accessioned2019-09-25T14:13:01Z
dc.date.available2019-09-25T14:13:01Z
dc.date.issued2019-09en_US
dc.description.abstractSpeech coding is the most commonly used application of speech processing. Accumulated layers of improvements have however made codecs so complex that optimization of individual modules becomes increasingly difficult. This work introduces machine learning methodology to speech and audio coding, such that we can optimize quality in terms of overall entropy. We can then use conventional quantization, coding and perceptual models without modification such that the codec adheres to conventional requirements on algorithmic complexity, latency and robustness to packet loss. Experiments demonstrate that end-to-end optimization of quantization accuracy of the spectral envelope can be used for a lossless reduction in bitrate of 0.4 kbits/s.en
dc.description.versionPeer revieweden
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationBäckström, T 2019, End-to-End Optimization of Source Models for Speech and Audio Coding Using a Machine Learning Framework . in Proceedings of Interspeech . Interspeech - Annual Conference of the International Speech Communication Association, International Speech Communication Association (ISCA), pp. 3401-3405, Interspeech, Graz, Austria, 15/09/2019 . https://doi.org/10.21437/Interspeech.2019-1284en
dc.identifier.doi10.21437/Interspeech.2019-1284en_US
dc.identifier.issn2308-457X
dc.identifier.otherPURE UUID: 8cb5f77d-8899-4276-a749-9988cafc80f4en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/8cb5f77d-8899-4276-a749-9988cafc80f4en_US
dc.identifier.otherPURE LINK: https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1284.pdfen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/37082504/ELEC_Backstrom_End_to_end_Interspeech.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/40464
dc.identifier.urnURN:NBN:fi:aalto-201909255485
dc.language.isoenen
dc.relation.ispartofInterspeechen
dc.relation.ispartofseriesProceedings of Interspeechen
dc.relation.ispartofseriesInterspeech - Annual Conference of the International Speech Communication Associationen
dc.rightsopenAccessen
dc.subject.keywordspeech and audio codingen_US
dc.subject.keywordend-to-end optimizationen_US
dc.subject.keywordspeech source modelingen_US
dc.titleEnd-to-End Optimization of Source Models for Speech and Audio Coding Using a Machine Learning Frameworken
dc.typeA4 Artikkeli konferenssijulkaisussafi
dc.type.versionpublishedVersion

Files