Title: | Machine learning methods for suprasegmental analysis and conversion in speech |
Author(s): | Seshadri, Shreyas |
Date: | 2020 |
Language: | en |
Pages: | 98 + app. 82 |
Department: | Signaalinkäsittelyn ja akustiikan laitos Department of Signal Processing and Acoustics |
ISBN: | 978-952-64-0167-6 (electronic) 978-952-64-0166-9 (printed) |
Series: | Aalto University publication series DOCTORAL DISSERTATIONS, 201/2020 |
ISSN: | 1799-4942 (electronic) 1799-4934 (printed) 1799-4934 (ISSN-L) |
Supervising professor(s): | Alku, Paavo, Prof., Aalto University, Department of Signal Processing and Acoustics, Finland |
Thesis advisor(s): | Räsänen, Okko, Asst. Prof., Tampere University, Finland |
Subject: | Electrical engineering |
Keywords: | suprasegmental speech processing, Bayesian learning, deep learning, zero-resource speech processing, word and syllable count estimation, speaking style conversion |
Archive | yes |
|
|
Abstract:Speech technology is a field of technological research focusing on methods to process spoken language. Work in the area has largely relied on a combination of domain-specific knowledge and digital signal processing (DSP) algorithms, often combined with statistical (parametric) models. In this context, machine learning (ML) has played a central role in estimating the parameters of such models. Recently, better access to large quantities of data has opened the door to advanced ML models that are less constrained by the assumptions necessary for the DSP models and are potentially capable of achieving higher performance.
|
|
Parts:[Publication 1]: Seshadri, S., Remes, U. & Räsänen, O. Comparison of non-parametric Bayesian mixture models for syllable clustering and zero-resource speech processing. In Annual Conference of the International Speech Communication Association (INTERSPEECH), Stockholm, Sweden, pp. 2744–2748, August 2017. Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-201711217678. DOI: 10.21437/Interspeech.2017-339 View at Publisher [Publication 2]: Seshadri, S., Remes, U. & Räsänen, O. Dirichlet process mixture models for clustering i-vector data. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, pp. 5740–5744, March 2017. DOI: 10.1109/ICASSP.2017.7953202 View at Publisher [Publication 3]: Räsänen, O., Seshadri, S., Karadayi, J., Riebling, E., Bunce, J., Cristia, A., Metze, F., Casillas, M., Rosemberg, C., Bergelson, E. & Soderstrome, M. Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech. Speech Communication, vol. 113, pp. 63–80, October 2019. Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-201909035123. DOI: 10.1016/j.specom.2019.08.005 View at Publisher [Publication 4]: Seshadri, S. & Räsänen, O. SylNet: An adaptable end-to-end syllable count estimator for speech. IEEE Signal Processing Letters, vol. 26, no. 9, pp. 1359–1363, July 2019. Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-201909205373. DOI: 10.1109/LSP.2019.2929415 View at Publisher [Publication 5]: Seshadri, S., Juvela, L., Räsänen, O. & Alku, P. Vocal effort based speaking style conversion using vocoder features and parallel learning. IEEE Access, vol. 7, pp. 17230–17246, January 2019. Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-201904022453. DOI: 10.1109/ACCESS.2019.2895923 View at Publisher [Publication 6]: Seshadri, S., Juvela, L., Yamagishi, J., Räsänen, O. & Alku, P. Cycle- consistent adversarial networks for non-parallel vocal effort based speaking style conversion. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, pp. 6835–6839, May 2019. Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-201906033383. DOI: 10.1109/ICASSP.2019.8682648 View at Publisher [Publication 7]: Seshadri, S., Juvela, L., Alku, P. & Räsänen, O. Augmented Cycle- GANs for continuous scale normal-to-Lombard speaking style conversion. In Annual Conference of the International Speech Communication Association (INTERSPEECH), Graz, Austria, pp. 2838–2842, September 2019. Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-202001021295. DOI: 10.21437/Interspeech.2019-1681 View at Publisher |
|
|
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Page content by: Aalto University Learning Centre | Privacy policy of the service | About this site