Efficient Application of Perceptual Models in Speech and Audio Coding

No Thumbnail Available

URL

Journal Title

Journal ISSN

Volume Title

Sähkötekniikan korkeakoulu | Master's thesis

Date

2020-12-15

Department

Major/Subject

Acoustics and Audio Technology

Mcode

ELEC3030

Degree programme

CCIS - Master’s Programme in Computer, Communication and Information Sciences (TS2013)

Language

en

Pages

65+4

Series

Abstract

Increasing digital storage and transmission of speech and audio necessitates the use of codecs that can reduce the digital size of the audio file. Knowledge about the limits of human hearing allows the creation of perceptual models, that can enable the removal of information while the perceived audio distortion remains minimal. The application of such perceptual models can be computationally complex and might be a bottleneck to the coding process. This thesis aims to improve the efficiency of the application of perceptual models in speech and audio codecs. Two approaches are taken to address the same: the first is to use neural networks to approximate the action of the perceptual model, and the second is to improve the efficiency with which the output of the perceptual model is applied to the coding process by taking a more analytical approach. The perceived distortion incurred by using the approximated perceptual model is examined both through the use of objective measures and through listening tests, and it is concluded that since the perceived reduction in quality is less significant than the differences in quality present just in using different high quality models, and hence it is beneficial to use the approximations to the perceptual models in cases where the complexity-reductions are large. As for the proposed analytical approach in the application of the perceptual model, there is a small statistically significant improvement in the efficiency of the use coupled with large reductions in computational costs when compared to the conventional approach.

Description

Supervisor

Bäckström, Tom

Thesis advisor

Bäckström, Tom

Keywords

speech and audio coding, transform coding, rate loop, perceptual models, neural networks

Other note

Citation