Speaker-Aware Training of Attention-Based End-to-End Speech Recognition Using Neural Speaker Embeddings
No Thumbnail Available
Journal Title
Journal ISSN
Volume Title
Conference article in proceedings
This publication is imported from Aalto University research portal.
View publication in the Research portal
View/Open full text file from the Research portal
Other link related to publication
View publication in the Research portal
View/Open full text file from the Research portal
Other link related to publication
Date
2020-05
Department
Major/Subject
Mcode
Degree programme
Language
en
Pages
5
7064-7068
7064-7068
Series
2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
Abstract
In speaker-aware training, a speaker embedding is appended to DNN input features. This allows the DNN to effectively learn representations, which are robust to speaker variability.We apply speaker-aware training to attention-based end-to-end speech recognition. We show that it can improve over a purely end-to-end baseline. We also propose speaker-aware training as a viable method to leverage untranscribed, speaker annotated data.We apply state-of-the-art embedding approaches, both i-vectors and neural embeddings, such as x-vectors. We experiment with embeddings trained in two conditions: on the fixed ASR data, and on a large untranscribed dataset. We run our experiments on the TED-LIUM and Wall Street Journal datasets. No embedding consistently outperforms all others, but in many settings neural embeddings outperform i-vectors.Description
| openaire: EC/H2020/780069/EU//MeMAD
Keywords
end-to-end speech recognition, speaker embedding, speaker-adaptation, speaker-aware training
Other note
Citation
Rouhe , A , Kaseva , T & Kurimo , M 2020 , Speaker-Aware Training of Attention-Based End-to-End Speech Recognition Using Neural Speaker Embeddings . in 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings . , 9053998 , Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing , IEEE , pp. 7064-7068 , IEEE International Conference on Acoustics, Speech, and Signal Processing , Barcelona , Spain , 04/05/2020 . https://doi.org/10.1109/ICASSP40776.2020.9053998