Deep learning text-to-speech synthesis with Flowtron and WaveGlow

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Master's thesis

Date

2023-05-15

Department

Major/Subject

Biomedical Engineering

Mcode

SCI3059

Degree programme

Master’s Programme in Life Science Technologies

Language

en

Pages

48+7

Series

Abstract

Innovation in the field of artificial speech synthesis using deep learning has been rapidly increasing over the past years. Current interest lies in the synthesis of speech that is able to model the complex prosody and stylistic features of natural spoken language using a minimal amount of data. Not only are such models remarkable from a technological perspective they also have immense potential as an application of custom voice assistive technology (AT) for people living with speech impairments. However, more research should be focused on the evaluation of the applicability of deep learning text-to-speech (TTS) systems in a real-world context. This thesis aims to further this research by employing two well-known TTS frameworks, Flowtron and WaveGlow, to train a voice clone model on limited personal speech data of a person living with locked in syndrome (LIS). The resulting artificial voice is assessed based on human perception. In addition, the results of the model are showcased in a user-friendly TTS application that also acts as a prototype for custom voice AT. Through the work in this thesis we explore the fascinating world of deep learning based artificial speech synthesis and inspire further research in its relevance toward the development of inclusive technology.

Description

Supervisor

Palva, Matias

Thesis advisor

Müller-Putz, Gernot

Keywords

TTS, voice cloning, deep learning, assistive technology

Other note

Citation