Speaker-Aware Training of Attention-Based End-to-End Speech Recognition Using Neural Speaker Embeddings

No Thumbnail Available
Journal Title
Journal ISSN
Volume Title
Conference article in proceedings
Date
2020-05
Major/Subject
Mcode
Degree programme
Language
en
Pages
5
7064-7068
Series
2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
Abstract
In speaker-aware training, a speaker embedding is appended to DNN input features. This allows the DNN to effectively learn representations, which are robust to speaker variability.We apply speaker-aware training to attention-based end-to-end speech recognition. We show that it can improve over a purely end-to-end baseline. We also propose speaker-aware training as a viable method to leverage untranscribed, speaker annotated data.We apply state-of-the-art embedding approaches, both i-vectors and neural embeddings, such as x-vectors. We experiment with embeddings trained in two conditions: on the fixed ASR data, and on a large untranscribed dataset. We run our experiments on the TED-LIUM and Wall Street Journal datasets. No embedding consistently outperforms all others, but in many settings neural embeddings outperform i-vectors.
Description
| openaire: EC/H2020/780069/EU//MeMAD
Keywords
end-to-end speech recognition, speaker embedding, speaker-adaptation, speaker-aware training
Other note
Citation
Rouhe , A , Kaseva , T & Kurimo , M 2020 , Speaker-Aware Training of Attention-Based End-to-End Speech Recognition Using Neural Speaker Embeddings . in 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings . , 9053998 , Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing , IEEE , pp. 7064-7068 , IEEE International Conference on Acoustics, Speech, and Signal Processing , Barcelona , Spain , 04/05/2020 . https://doi.org/10.1109/ICASSP40776.2020.9053998