Efficient Application of Perceptual Models in Speech and Audio Coding

No Thumbnail Available
Journal Title
Journal ISSN
Volume Title
Sähkötekniikan korkeakoulu | Master's thesis
Date
2020-12-15
Department
Major/Subject
Acoustics and Audio Technology
Mcode
ELEC3030
Degree programme
CCIS - Master’s Programme in Computer, Communication and Information Sciences (TS2013)
Language
en
Pages
65+4
Series
Abstract
Increasing digital storage and transmission of speech and audio necessitates the use of codecs that can reduce the digital size of the audio file. Knowledge about the limits of human hearing allows the creation of perceptual models, that can enable the removal of information while the perceived audio distortion remains minimal. The application of such perceptual models can be computationally complex and might be a bottleneck to the coding process. This thesis aims to improve the efficiency of the application of perceptual models in speech and audio codecs. Two approaches are taken to address the same: the first is to use neural networks to approximate the action of the perceptual model, and the second is to improve the efficiency with which the output of the perceptual model is applied to the coding process by taking a more analytical approach. The perceived distortion incurred by using the approximated perceptual model is examined both through the use of objective measures and through listening tests, and it is concluded that since the perceived reduction in quality is less significant than the differences in quality present just in using different high quality models, and hence it is beneficial to use the approximations to the perceptual models in cases where the complexity-reductions are large. As for the proposed analytical approach in the application of the perceptual model, there is a small statistically significant improvement in the efficiency of the use coupled with large reductions in computational costs when compared to the conventional approach.
Description
Supervisor
Bäckström, Tom
Thesis advisor
Bäckström, Tom
Keywords
speech and audio coding, transform coding, rate loop, perceptual models, neural networks
Other note
Citation