Voice-quality Features for Deep Neural Network Based Speaker Verification Systems
Loading...
Access rights
openAccess
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal
View/Open full text file from the Research portal
Other link related to publication
View publication in the Research portal
View/Open full text file from the Research portal
Other link related to publication
Date
2021-08-27
Department
Major/Subject
Mcode
Degree programme
Language
en
Pages
5
176-180
176-180
Series
29th European Signal Processing Conference, EUSIPCO 2021 - Proceedings, European Signal Processing Conference
Abstract
Jitter and shimmer are voice-quality features which have been successfully used to detect voice pathologies and classify different speaking styles. In this paper, we investigate the usefulness of such voice-quality features in neural-network based speaker verification systems. To combine these two sets of features, the cosine distance scores estimated from the two sets are linearly weighted to obtain a single, fused score. The fused score is used to accept/reject a given speaker. The experimental results carried out on Voxceleb-1 dataset demonstrate that the fusion of the cosine distance scores extracted from the mel-spectrogram and voice quality features provide a 15% relative improvement in Equal Error Rate (EER) compared to the baseline system which is based only on mel-spectrogram features.Description
Keywords
jitter, mel-spectrogram, fusion, shimmer, speech recognition
Other note
Citation
Zewoudie, A, Koivisto, L & Bäckström, T 2021, Voice-quality Features for Deep Neural Network Based Speaker Verification Systems . in 29th European Signal Processing Conference, EUSIPCO 2021 - Proceedings . European Signal Processing Conference, IEEE, pp. 176-180, European Signal Processing Conference, Dublin, Ireland, 23/08/2021 . https://doi.org/10.23919/EUSIPCO54536.2021.9616242