Implementing and testing the performance of deep neural network speaker verification models
No Thumbnail Available
URL
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu |
Master's thesis
Authors
Date
2023-01-23
Department
Major/Subject
Machine Learning, Data Science and Artificial Intelligence
Mcode
SCI3044
Degree programme
Master’s Programme in Computer, Communication and Information Sciences
Language
en
Pages
73
Series
Abstract
Speaker verification is a subtask of speaker recognition that employs speech, the most natural way of communication, as a form of biometric analysis. For this, a system extracts and models the characteristic features of speaker voices from their speech signals. This verification is an essential tool in many applications, ranging from law enforcement to voice-controlled smart assistants (e.g., Siri) that are currently widespread in our daily lives. However, speech contains a large degree of variability from different sources that can severely degrade the performance of these systems. Thus, current developments have been focused on subduing these issues thanks to the creation of large datasets tailored for speaker recognition and the advances in deep learning that have significantly boosted performance. Specifically, deep speaker embeddings are a successful technique to represent a speaker using a fixed-dimensional feature vector. This thesis focuses on implementing two speaker verification systems that extract deep speaker embeddings using deep neural networks and an advanced objective function. Moreover, the models are analyzed using various test sets, such as in "in the wild" environments or employing unseen languages, specifically Finnish. The experiments demonstrated the excellent generalization ability and robustness of the models against adverse conditions and their capacity to be language-agnostic.Description
Supervisor
Kurimo, MikkoThesis advisor
Virkkunen, AnjaKeywords
speaker recognition, speaker verification, deep learning, x-vector, ECAPA-TDNN