Deep learning text-to-speech synthesis with Flowtron and WaveGlow
Loading...
URL
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
2023-05-15
Department
Major/Subject
Biomedical Engineering
Mcode
SCI3059
Degree programme
Master’s Programme in Life Science Technologies
Language
en
Pages
48+7
Series
Abstract
Innovation in the field of artificial speech synthesis using deep learning has been rapidly increasing over the past years. Current interest lies in the synthesis of speech that is able to model the complex prosody and stylistic features of natural spoken language using a minimal amount of data. Not only are such models remarkable from a technological perspective they also have immense potential as an application of custom voice assistive technology (AT) for people living with speech impairments. However, more research should be focused on the evaluation of the applicability of deep learning text-to-speech (TTS) systems in a real-world context. This thesis aims to further this research by employing two well-known TTS frameworks, Flowtron and WaveGlow, to train a voice clone model on limited personal speech data of a person living with locked in syndrome (LIS). The resulting artificial voice is assessed based on human perception. In addition, the results of the model are showcased in a user-friendly TTS application that also acts as a prototype for custom voice AT. Through the work in this thesis we explore the fascinating world of deep learning based artificial speech synthesis and inspire further research in its relevance toward the development of inclusive technology.Description
Supervisor
Palva, MatiasThesis advisor
Müller-Putz, GernotKeywords
TTS, voice cloning, deep learning, assistive technology