Acoustic Model Compression with MAP adaptation

Loading...
Thumbnail Image

Access rights

openAccess
acceptedVersion

URL

Journal Title

Journal ISSN

Volume Title

A4 Artikkeli konferenssijulkaisussa

Date

Major/Subject

Mcode

Degree programme

Language

en

Pages

5

Series

Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden, pp. 65-69, Linköping Electronic Conference Proceedings ; Volume 131

Abstract

Speaker adaptation is an important step in optimization and personalization of the performance of automatic speech recognition (ASR) for individual users. While many applications target in rapid adaptation by various global transformations, slower adaptation to obtain a higher level of personalization would be useful for many active ASR users, especially for those whose speech is not recognized well. This paper studies the outcome of combinations of maximum a posterior (MAP) adaptation and compression of Gaussian mixture models. An important result that has not received much previous attention is how MAP adaptation can be utilized to radically decrease the size of the models as they get tuned to a particular speaker. This is particularly relevant for small personal devices which should provide accurate recognition in real-time despite a low memory, computation, and electricity consumption. With our method we are able to decrease the model complexity with MAP adaptation while increasing the accuracy.

Description

Other note

Citation

Leino, K & Kurimo, M 2017, Acoustic Model Compression with MAP adaptation. in J Tiedemann (ed.), Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden. Linköping Electronic Conference Proceedings, vol. 131, Linköping University Electronic Press, pp. 65-69, Nordic Conference on Computational Linguistics, Gothenburg, Sweden, 22/05/2017.